Wenhan Ma
CuteNPC
AI & ML interests
Large Language Model
Recent Activity
upvoted
a
paper
13 days ago
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
liked
a model
28 days ago
Lansechen/deepseek-v2-lite-16b-chat-R1-Distill-bs17k-batch32
authored
a paper
about 2 months ago
Stabilizing MoE Reinforcement Learning by Aligning Training and
Inference Routers
Organizations
None yet