Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update Nov 11, 2024
Post
790
๐—”๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐—น๐—ฎ๐˜„๐˜€ ๐—ผ๐˜ƒ๐—ฒ๐—ฟ? ๐—” ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐—ฟ๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐—œ๐—ป๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ฎ๐—ป๐—ป๐—ผ๐˜‚๐—ป๐—ฐ๐—ฒ๐—ฑ ๐˜๐—ต๐—ฎ๐˜ ๐—ข๐—ฝ๐—ฒ๐—ป๐—”๐—œ ๐—ถ๐˜€ ๐˜€๐—ฒ๐—ฒ๐—ถ๐—ป๐—ด ๐—ฑ๐—ถ๐—บ๐—ถ๐—ป๐—ถ๐˜€๐—ต๐—ถ๐—ป๐—ด ๐—ฟ๐—ฒ๐˜๐˜‚๐—ฟ๐—ป๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐˜‚๐—ฝ ๐˜๐—ต๐—ฒ ๐—ป๐—ฒ๐˜…๐˜ ๐—š๐—ฃ๐—ง ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€.

๐Ÿ“Š What are scaling laws? These are empiric laws that say "Every time you increase compute spent in training 10-fold, your LLM's performance will go up by a predictable tick". Of course, they apply only if you train your model with the right methods.

The image below illustrates it: they're from a paper by Google, "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation", and they show how quality and instruction following of models improve when you scale the model up (which is equivalent to scaling up the compute spent in training).

โžก๏ธ These scaling laws have immense impact: they triggered the largest gold rush ever, with companies pouring billions into scaling up theiur training. Microsoft and OpenAI spent 100B into their "Startgate" mega training cluster, due to start running in 2028.

๐Ÿค” So, what about these reports of scaling laws slowing down?

If they are true, they would mean a gigantic paradigm shift, as the hundreds of billions poured by AI companies into scaling could be a dead-end. โ›”๏ธ

But I doubt it: until the most recent publications, scaling laws showed no signs of weakness, and the researchers at the higher end of the scale-up seems to imply the scaling up continues.

Wait and see!

Shouldn't the takeaway from scaling laws be mostly negative?
The fact that scaling compute so much improves output quality by so little seems unintuitive.

One could argue that this is still positive, as there is still room to grow, but I see it as much more exciting to see some new training technique in action, or good results on smaller compute training.