Can We Train an MoE Model with the Same Total Parameters and Performance as Dense?
An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.
MoEPretrainLLMICLR 2026 oralData ScalingData Reuse
Driven by AGI curiosity and mission: my papers, projects, and learning notes.
An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.
An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.
No open-source launches yet.
No reading notes yet.