Keely0419 commited on
Commit
5e9ac2e
·
verified ·
1 Parent(s): cb1ddcc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -91,6 +91,9 @@ response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_token
91
  print(response)
92
  ~~~~
93
 
 
 
 
94
  ---
95
 
96
  # Tokenization
 
91
  print(response)
92
  ~~~~
93
 
94
+ It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
95
+ Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
96
+
97
  ---
98
 
99
  # Tokenization