Junjie Lu's picture

2 4

Junjie Lu

Lux0926

·

Lux0926

AI & ML interests

None yet

Recent Activity

updated a dataset about 2 months ago

Lux0926/Qwen2-7B-SFT-CGPO-10k

updated a dataset about 2 months ago

Lux0926/Qwen1.5-32B-SFT-CGPO-10k

updated a dataset about 2 months ago

Lux0926/DeepSeekMath-Base-7B-SFT-CGPO-10k

View all activity

Organizations

upvoted an article about 2 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7

•

255

upvoted 2 papers 8 months ago

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Paper • 2504.15843 • Published Apr 22 • 16

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Paper • 2503.22230 • Published Mar 28 • 45

upvoted a paper 10 months ago

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Paper • 2502.13943 • Published Feb 19 • 8