Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models Paper • 2510.11683 • Published Oct 13, 2025 • 14
DeepPrune: Parallel Scaling without Inter-trace Redundancy Paper • 2510.08483 • Published Oct 9, 2025 • 24
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8, 2025 • 195
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23, 2025 • 56
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Paper • 2506.09942 • Published Jun 11, 2025 • 5
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Paper • 2506.09942 • Published Jun 11, 2025 • 5
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following Paper • 2506.09942 • Published Jun 11, 2025 • 5 • 2
VerIF Collection RL trained models and datasets for instruction-following • 7 items • Updated Jun 12, 2025 • 5
VerIF Collection RL trained models and datasets for instruction-following • 7 items • Updated Jun 12, 2025 • 5