TransMLA-base Base Model for TransMLA TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57 fxmeng/transmla_pretrain_6B_tokens Viewer • Updated Jul 5, 2025 • 5.94M • 230 fxmeng/transmla_pretrain_1B_tokens Viewer • Updated Jul 5, 2025 • 1.14M • 113 fxmeng/transmla_pretrain_100m_tokens Viewer • Updated Jul 5, 2025 • 100k • 35
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57
CLOVER-Commonsense-148k CLOVER: Constrained Learning with Orthonormal Vectors for Eliminating Redundancy Paper • 2411.17426 • Published Nov 26, 2024 fxmeng/CLOVER-llama-2-7b-commonsense-148k 7B • Updated Feb 2, 2025 • 4 fxmeng/CLOVER-llama-13b-commonsense-148k 13B • Updated Feb 2, 2025 • 4 fxmeng/CLOVER-llama-3-8b-commonsense-148k 8B • Updated Feb 2, 2025 • 3
CLOVER: Constrained Learning with Orthonormal Vectors for Eliminating Redundancy Paper • 2411.17426 • Published Nov 26, 2024
TransMLA-base Base Model for TransMLA TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57 fxmeng/transmla_pretrain_6B_tokens Viewer • Updated Jul 5, 2025 • 5.94M • 230 fxmeng/transmla_pretrain_1B_tokens Viewer • Updated Jul 5, 2025 • 1.14M • 113 fxmeng/transmla_pretrain_100m_tokens Viewer • Updated Jul 5, 2025 • 100k • 35
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57
CLOVER-Commonsense-148k CLOVER: Constrained Learning with Orthonormal Vectors for Eliminating Redundancy Paper • 2411.17426 • Published Nov 26, 2024 fxmeng/CLOVER-llama-2-7b-commonsense-148k 7B • Updated Feb 2, 2025 • 4 fxmeng/CLOVER-llama-13b-commonsense-148k 13B • Updated Feb 2, 2025 • 4 fxmeng/CLOVER-llama-3-8b-commonsense-148k 8B • Updated Feb 2, 2025 • 3
CLOVER: Constrained Learning with Orthonormal Vectors for Eliminating Redundancy Paper • 2411.17426 • Published Nov 26, 2024