ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 88 • 2
A Refined Analysis of Massive Activations in LLMs Paper • 2503.22329 • Published Mar 28, 2025 • 14 • 3
Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published Mar 21, 2025 • 5 • 2