Ermakov Petr
ermakovpetr
AI & ML interests
LLM, Search, Diffusion
Recent Activity
liked
a model
about 12 hours ago
yandex/YandexGPT-5-Lite-8B-pretrain
reacted
to
artnitolog's
post
with 🤝
9 days ago
Recently, we open-sourced YaFSDP, Yandex’s tool for efficient distributed training of LLMs.
Here are some of the key ideas used in YaFSDP to provide speedup and memory savings over FSDP:
• Allocate and utilize just two buffers throughout the transformer for all collected weights to circumvent the torch memory allocator;
• Gather small normalization layers at the beginning of the iteration and average the gradients only at the end;
• Move gradient division to the very end of the backward pass.
To learn more about how YaFSDP works, check out our latest blog post: https://medium.com/yandex/yafsdp-a-tool-for-faster-llm-training-and-optimized-gpu-utilization-is-no-632b7539f5b3
reacted
to
artnitolog's
post
with 🤗
9 days ago
Recently, we open-sourced YaFSDP, Yandex’s tool for efficient distributed training of LLMs.
Here are some of the key ideas used in YaFSDP to provide speedup and memory savings over FSDP:
• Allocate and utilize just two buffers throughout the transformer for all collected weights to circumvent the torch memory allocator;
• Gather small normalization layers at the beginning of the iteration and average the gradients only at the end;
• Move gradient division to the very end of the backward pass.
To learn more about how YaFSDP works, check out our latest blog post: https://medium.com/yandex/yafsdp-a-tool-for-faster-llm-training-and-optimized-gpu-utilization-is-no-632b7539f5b3
Organizations
models
None public yet
datasets
None public yet