Typos

#80
by iandanforth - opened

"knowledge and technics" -> "knowledge and techniques"

"here to changes that" -> "here to change that"

"simplest to the most raffined one" -> "simplest to the most refined"

"We'll assumes you" -> "We'll assume you"

"how deep learning model are trained" -> "how deep learning models are trained"

"to fully understand how how performing LLMs" -> "to fully understand how high performing LLMs" (guessing at the intent here)

"what itโ€™s advantages and limits are" -> "what its advantages and limits are"

iandanforth changed discussion title from Typo to Typos

In the cheatsheet:
"ep: context parallelism" โ†’ "ep: expert parallelism"

thanks for this amazing book HF !

"When training a neural network model, one store several items in memory:" โ†’ When training a neural network model, one stores several items in memory:

Nanotron Research org
โ€ข
edited 5 days ago

Awesome! Thanks a lot for the pull request

Is the formula bst=bsโˆ—seq correct? bs = bst * seq seems like the correct formula.

and are roughtly familiar -> and are roughly familiar

Nanotron Research org

Is the formula bst=bsโˆ—seq correct? bs = bst * seq seems like the correct formula.

depends on how you define "bst" and "bs". We chose to define "bst" as batch size in tokens, which would be bs*seq (batch size in samples times sample length)

Small typos:

  • "Using the Pytorch profiler we can understand how memory is allocated througho ut training" -> "Using the Pytorch profiler we can understand how memory is allocated throughout training"
  • "Why does the first step looks different:" -> "Why does the first step look different:"
  • The TeX type text is not visible here: image.png

image.png
When training a neural network model, one store several items in memory: --> When training a neural network model, one stores several items in memory:

I'm not sure which files to change on a PR with edits. I only see the pdf file but not the source markdown file

image.png

You would think for a model you could compute the memory requirements exactly but there are a few additional memory occupants that makes it hard to be exact: -->
You would think for a model you could compute the memory requirements exactly, but there are a few additional memory occupants that make it hard to be exact:

Nanotron Research org

Just merged #88 (with fix for the previous message where i put a thumbs up), don't hesitate to ping me if there is still some typo (and thanks for the "old text" -> "new text" format it made my life much more easier ahah)

Sign up or log in to comment