Lifting the Curse of Capacity Gap in Distilling Language Models
Paper
•
2305.12129
•
Published
minimoe-3L-384H distilled from bert-base-uncased on Wikipedia.
repository: https://github.com/GeneZC/MiniMoE arXiv: https://arxiv.org/abs/2305.12129