Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
8
NFTCID
NFTCID
Follow
0 followers
ยท
8 following
AI & ML interests
None yet
Recent Activity
reacted
to
m-ric
's
post
with ๐
about 22 hours ago
๐ ๐ถ๐ป๐ถ๐ ๐ฎ๐ '๐ ๐ป๐ฒ๐ ๐ ๐ผ๐ ๐๐๐ ๐ฟ๐ฒ๐ฎ๐ฐ๐ต๐ฒ๐ ๐๐น๐ฎ๐๐ฑ๐ฒ-๐ฆ๐ผ๐ป๐ป๐ฒ๐ ๐น๐ฒ๐๐ฒ๐น ๐๐ถ๐๐ต ๐ฐ๐ ๐๐ผ๐ธ๐ฒ๐ป๐ ๐ฐ๐ผ๐ป๐๐ฒ๐ ๐ ๐น๐ฒ๐ป๐ด๐๐ต ๐ฅ This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach. ๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐: ๐๏ธ MoE with novel hybrid attention: โฃ Mixture of Experts with 456B total parameters (45.9B activated per token) โฃ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers ๐ Outperforms leading models across benchmarks while offering vastly longer context: โฃ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks โฃ Can efficiently handle 4M token contexts (vs 256K for most other LLMs) ๐ฌ Technical innovations enable efficient scaling: โฃ Novel expert parallel and tensor parallel strategies cut communication overhead in half โฃ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%) ๐ฏ Thorough training strategy: โฃ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge! Overall, not only is the model impressive, but the technical paper is also really interesting! ๐ It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs. Read it in full here ๐ https://huggingface.co/papers/2501.08313 Model here, allows commercial use <100M monthly users ๐ https://huggingface.co/MiniMaxAI/MiniMax-Text-01
liked
a Space
3 days ago
akhaliq/anychat
liked
a model
15 days ago
ibm-granite/granite-3.1-8b-instruct
View all activity
Organizations
None yet
NFTCID
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a Space
3 days ago
Running
on
CPU Upgrade
1.39k
๐ข
Anychat
liked
2 models
15 days ago
ibm-granite/granite-3.1-8b-instruct
Text Generation
โข
Updated
about 1 month ago
โข
47.6k
โข
123
PowerInfer/SmallThinker-3B-Preview
Text Generation
โข
Updated
3 days ago
โข
57.4k
โข
356
liked
a dataset
15 days ago
agibot-world/AgiBotWorld-Alpha
Viewer
โข
Updated
3 days ago
โข
19.7M
โข
14.2k
โข
161
liked
a model
3 months ago
genmo/mochi-1-preview
Text-to-Video
โข
Updated
about 1 month ago
โข
40.5k
โข
1.15k
liked
a model
6 months ago
black-forest-labs/FLUX.1-schnell
Text-to-Image
โข
Updated
Aug 16, 2024
โข
631k
โข
โข
3.24k
liked
a model
about 1 year ago
ibm/re2g-reranker-trex
Text Classification
โข
Updated
May 16, 2023
โข
1k
โข
7
liked
a dataset
about 1 year ago
Yelp/yelp_review_full
Viewer
โข
Updated
Jan 4, 2024
โข
700k
โข
13.5k
โข
107