Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published 19 days ago • 44
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 22 days ago • 158
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 174
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning Paper • 2510.02240 • Published Oct 2 • 17
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper • 2510.12422 • Published Oct 14 • 1
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper • 2510.12422 • Published Oct 14 • 1
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning Paper • 2510.02240 • Published Oct 2 • 17
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query Paper • 2506.03144 • Published Jun 3 • 7
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras Paper • 2507.17664 • Published Jul 23 • 1
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase Paper • 2309.05573 • Published Sep 11, 2023 • 2
The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation Paper • 2307.15061 • Published Jul 27, 2023 • 1