My papers.

Paper explainers and research notes for my own work.

Pinned Paper Explainer · P1 30 min read

Can We Train an MoE Model with the Same Total Parameters and Performance as Dense?

An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.

MoEPretrainLLMICLR 2026 oralData ScalingData Reuse

Can We Train an MoE Model with the Same Total Parameters and Performance as Dense? figure

Appendix · P2 25 min read

Appendix: Can We Train an MoE Model with the Same Total Parameters and Performance as Dense?

An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.

MoEPretrainLLMICLR 2026 oralData ScalingData ReuseAPPENDIX

Appendix: Can We Train an MoE Model with the Same Total Parameters and Performance as Dense? figure

No posts for this topic.

Choose another topic or return to the full research flow.

Load more research notes