ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper
•
2510.12693
•
Published
•
27
None defined yet.
How Much 3D Do Video Foundation Models Encode?
Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360-Degree Firefighting Videos