Questions?

#84
by nouamanetazi HF staff - opened
Nanotron Research org

If you have any questions about the content of the blog, feel free to ask here!

nouamanetazi pinned discussion

Hi, I am just a nobbie, trying to learn about the training LLM models.
It might be dumb from me to ask this.
But can anyone tell me how the batch size effects the throughput (tokens generated /sec) also how does having a larger batch size will tend to make less use of each training token rendering convergence slower and potentially wasting compute.

image.png
Does the value in this graph really make sense?[40, 180, 320, 460]^(T) @ [20, 40]...

Sign up or log in to comment