OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 92
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 101
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute Paper • 2509.04475 • Published Aug 30, 2025 • 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 95
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 85
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models Paper • 2512.07843 • Published Nov 24, 2025 • 21
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10, 2025 • 30
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16, 2025 • 41
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models Paper • 2410.09671 • Published Oct 12, 2024 • 1
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 18 days ago • 201
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published 17 days ago • 48
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 13 days ago • 60
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published 13 days ago • 14
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11, 2025 • 33
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization Paper • 2511.12982 • Published Nov 17, 2025 • 3
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark? Paper • 2509.07894 • Published Sep 9, 2025 • 31
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 5 days ago • 49