-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 143 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 139 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2507.16075
-
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
Paper • 2509.06283 • Published • 17 -
Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
Text Generation • 31B • Updated • 15.3k • 781 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 72 -
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 67 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 124 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 87
-
Robust Multimodal Large Language Models Against Modality Conflict
Paper • 2507.07151 • Published • 5 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 31 -
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 24 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 1 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 33
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Magistral
Paper • 2506.10910 • Published • 65 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 5 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 143 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 88
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 143 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 139 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
Paper • 2509.06283 • Published • 17 -
Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
Text Generation • 31B • Updated • 15.3k • 781 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 72 -
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 67 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 124 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 87
-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 24 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 1 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 33
-
Robust Multimodal Large Language Models Against Modality Conflict
Paper • 2507.07151 • Published • 5 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 31 -
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Magistral
Paper • 2506.10910 • Published • 65 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 5 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 143 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 88