Bhargav Solanki

solankibhargav

AI & ML interests

None yet

Recent Activity

Organizations

Abacus Research AG's profile picture

solankibhargav's activity

reacted to m-ric's post with šŸ‘ about 1 month ago
view post
Post
2495
š‡š®š š š¢š§š  š…šššœšž š«šžš„šžššš¬šžš¬ šš¢šœšØš­š«šØš§, šš š¦š¢šœš«šØš¬šœšØš©š¢šœ š„š¢š› š­š”ššš­ š¬šØš„šÆšžš¬ š‹š‹šŒ š­š«ššš¢š§š¢š§š  šŸ’šƒ š©ššš«ššš„š„šžš„š¢š³ššš­š¢šØš§ šŸ„³

šŸ•°ļø Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

šŸ‘“šŸ» If they had needed all this time, we would have GPU stories from the time of Pharaoh š“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

šŸ› ļø But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

šŸ¤ š—•š˜‚š˜ š—»š—¼š˜„ š˜„š—² š—±š—¼š—»'š˜ š—»š—²š—²š—± š—µš˜‚š—“š—² š—暝—²š—½š—¼š˜€ š—®š—»š˜†š—ŗš—¼š—暝—²! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

āš” š—œš˜'š˜€ š˜š—¶š—»š˜†, š˜†š—²š˜ š—½š—¼š˜„š—²š—暝—³š˜‚š—¹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look šŸ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
  • 1 reply
Ā·
New activity in llava-hf/vip-llava-13b-hf about 2 months ago

Support for vllm/lmdeploy?

1
#1 opened about 2 months ago by
solankibhargav
New activity in THUDM/glm-edge-v-5b about 2 months ago

No support on vllm/lmdeploy

#1 opened about 2 months ago by
solankibhargav
New activity in google/gemma-2-27b-it about 2 months ago
New activity in stepfun-ai/GOT-OCR2_0 4 months ago
New activity in AbacusResearch/Jallabi-34B 5 months ago
New activity in cognitivecomputations/dolphin-vision-72b 7 months ago

Steps to fine tune?

1
#5 opened 7 months ago by
solankibhargav
New activity in mistralai/Codestral-22B-v0.1 8 months ago