Optimizer_Papers Adam-mini: Use Fewer Learning Rates To Gain More Paper • 2406.16793 • Published Jun 24 • 67
MoE_Papers A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26 • 15
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26 • 15