EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 134
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say Paper • 2509.21164 • Published Sep 25, 2025 • 8
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20, 2025 • 62