Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.04625

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6 • 21
Learning from Failures in Multi-Attempt Reinforcement Learning

Paper • 2503.04808 • Published Mar 4 • 18
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published Mar 6 • 72
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10 • 47

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20 • 47
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Paper • 2502.12853 • Published Feb 18 • 29
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47

Reasoning Models

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11 • 40
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 152
S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
Towards an AI co-scientist

Paper • 2502.18864 • Published Feb 26 • 51
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 192

RLMs (Reasoning Language Models)

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Paper • 2503.00735 • Published Mar 2 • 23
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 38

Towards an AI co-scientist

Paper • 2502.18864 • Published Feb 26 • 51
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 286
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published Mar 24 • 31

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6 • 21
Learning from Failures in Multi-Attempt Reinforcement Learning

Paper • 2503.04808 • Published Mar 4 • 18
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
Towards an AI co-scientist

Paper • 2502.18864 • Published Feb 26 • 51
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 192

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published Mar 6 • 72
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10 • 47

RLMs (Reasoning Language Models)

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Paper • 2503.00735 • Published Mar 2 • 23
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 38

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20 • 47
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Paper • 2502.12853 • Published Feb 18 • 29
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 47

Towards an AI co-scientist

Paper • 2502.18864 • Published Feb 26 • 51
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Reasoning Models

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11 • 40
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 152
S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 286
Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published Mar 24 • 31

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs