AGI research learning notes and work explainers.

Driven by AGI curiosity and mission: my papers, projects, and learning notes.

Pinned Paper Explainer · P1 30 min read

Can We Train an MoE Model with the Same Total Parameters and Performance as Dense?

An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.

MoEPretrainLLMICLR 2026 oralData ScalingData Reuse

Can We Train an MoE Model with the Same Total Parameters and Performance as Dense? figure

Appendix · P2 25 min read

Can We Train an MoE Model with the Same Total Parameters and Performance as Dense?

An ICLR 2026 Oral paper explainer: MoE needs a more aggressive data scaling strategy.

MoEPretrainLLMICLR 2026 oralData ScalingData Reuse

Can We Train an MoE Model with the Same Total Parameters and Performance as Dense? figure

Load more research notes

Open Source

No open-source launches yet.

Reading Notes

No reading notes yet.