P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17 • 133
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11 • 37
Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Paper • 2406.10638 • Published Jun 15, 2024
MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos Paper • 2502.12558 • Published Feb 18
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval Paper • 2502.11431 • Published Feb 17
Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification Paper • 2506.19225 • Published Jun 24
TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos Paper • 2509.26360 • Published Sep 30
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11 • 37
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22 • 160
Video Understanding with Large Language Models: A Survey Paper • 2312.17432 • Published Dec 29, 2023 • 3
DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks Paper • 2307.05628 • Published Jul 11, 2023 • 10
Cross Contrasting Feature Perturbation for Domain Generalization Paper • 2307.12502 • Published Jul 24, 2023
Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering Paper • 2402.00827 • Published Feb 1, 2024 • 2