google/gemma-2-9b · Why does this model use left-padding by default?

16 days ago

I have a question about padding in tokenizers. By default, padding tokens are added to the left (start) of the sequence, unless we set padding_side='right' when loading the tokenizer.
Since LLMs process text from left to right, wouldn't having padding tokens at the start potentially affect how the model reads the actual content? I'm trying to understand why this is the default setting.
Also, does anyone know if Gemma-2 models were trained with this left-padding approach?

smbslt3 changed discussion title from Why does this model 'left padded'? to Why does this model use left-padding by default? 16 days ago

GopiUppari

Google org 15 days ago

•

edited 15 days ago

Hi @smbslt3 ,

Large Language Models are decoder-only architectures, during inference left-padding (padding_side='left') is often preferred. This is because many LLMs are trained to predict the next token based on preceding context. If padding tokens are on the right, the model might generate outputs that include or are influenced by these padding tokens, leading to incorrect results. Left-padding aligns the input such that the model processes the meaningful tokens in their intended order, improving the quality of the generated text. For more details, could you please refer to this link.

does anyone know if Gemma-2 models were trained with this left-padding approach? ==> not explicitly documented in any resources.

Thank you.