-
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
SafeArena: Evaluating the Safety of Autonomous Web Agents
Paper • 2503.04957 • Published • 21 -
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper • 2503.04808 • Published • 18 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113
Collections
Discover the best community collections!
Collections including paper arxiv:2503.04625
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 47 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Paper • 2412.15084 • Published • 13 -
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 99
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
Towards an AI co-scientist
Paper • 2502.18864 • Published • 51 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192
-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 286 -
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Paper • 2503.18892 • Published • 31
-
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
SafeArena: Evaluating the Safety of Autonomous Web Agents
Paper • 2503.04957 • Published • 21 -
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper • 2503.04808 • Published • 18 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
Towards an AI co-scientist
Paper • 2502.18864 • Published • 51 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 38
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 47 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Paper • 2412.15084 • Published • 13 -
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 99
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 286 -
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Paper • 2503.18892 • Published • 31