Coupled Variational Reinforcement Learning for Language Model General Reasoning Paper • 2512.12576 • Published 20 days ago • 2
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 244
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2, 2025 • 83
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Paper • 2504.12764 • Published Apr 17, 2025 • 41
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published Apr 14, 2025 • 85
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing Paper • 2502.04675 • Published Feb 7, 2025 • 1
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models Paper • 2503.18034 • Published Mar 23, 2025 • 1
SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency Paper • 2502.02458 • Published Feb 4, 2025 • 1
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published Apr 1, 2025 • 26
Running 3.62k The Ultra-Scale Playbook 🌌 3.62k The ultimate guide to training LLM on large GPU Clusters
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Paper • 2502.01142 • Published Feb 3, 2025 • 24
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published Jan 3, 2025 • 17
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published Jan 3, 2025 • 17 • 2
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 24
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 24