-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Paper • 2202.08906 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 39
Collections
Discover the best community collections!
Collections including paper arxiv:2202.08906
-
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
Paper • 2009.10622 • Published • 1 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 48 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Paper • 2401.14361 • Published • 2
-
Mixtral of Experts
Paper • 2401.04088 • Published • 157 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 48 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 4 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 4 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 26 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper • 2308.12066 • Published • 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper • 2112.14397 • Published • 1