Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 8 days ago • 12
HMoE: Heterogeneous Mixture of Experts for Language Modeling Paper • 2408.10681 • Published Aug 20, 2024 • 9
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published Nov 4, 2024 • 24