Houyi Li

My papers.

Paper explainers and research notes for my own work.

Pinned Paper Explainer · P1 30 min read

Can We Train an MoE Model with the Same Total Parameters and Performance as Dense?

An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.

MoEPretrainLLMICLR 2026 oralData ScalingData Reuse
Can We Train an MoE Model with the Same Total Parameters and Performance as Dense? figure
Appendix · P2 25 min read

Appendix: Can We Train an MoE Model with the Same Total Parameters and Performance as Dense?

An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.

MoEPretrainLLMICLR 2026 oralData ScalingData ReuseAPPENDIX
Appendix: Can We Train an MoE Model with the Same Total Parameters and Performance as Dense? figure
Load more research notes