BlackMamba: Mixture of Experts for State-Space Models
Abstract
State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks while achieving linear time and memory complexity as a function of sequence length. Mamba, a recently released SSM model, shows impressive performance in both language modeling and long sequence processing tasks. Simultaneously, mixture-of-expert (MoE) models have shown remarkable performance while significantly reducing the compute and latency costs of inference at the expense of a larger memory footprint. In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both. We demonstrate that BlackMamba performs competitively against both Mamba and transformer baselines, and outperforms in inference and training FLOPs. We fully train and open-source 340M/1.5B and 630M/2.8B BlackMamba models on 300B tokens of a custom dataset. We show that BlackMamba inherits and combines both of the benefits of SSM and MoE architectures, combining linear-complexity generation from SSM with cheap and fast inference from MoE. We release all weights, checkpoints, and inference code open-source. Inference code at: https://github.com/Zyphra/BlackMamba
Community
Similar paper published last month: https://arxiv.org/pdf/2401.04081.pdf
Yep! Concurrent work and we actually trained a model.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts (2024)
- SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention (2023)
- Gated Linear Attention Transformers with Hardware-Efficient Training (2023)
- Mixtral of Experts (2024)
- Fast Inference of Mixture-of-Experts Language Models with Offloading (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
BlackMamba: Revolutionizing Language Models with Mixture of Experts & State-Space Models
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper