Update README.md
Browse files
README.md
CHANGED
@@ -91,6 +91,9 @@ response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_token
|
|
91 |
print(response)
|
92 |
~~~~
|
93 |
|
|
|
|
|
|
|
94 |
---
|
95 |
|
96 |
# Tokenization
|
|
|
91 |
print(response)
|
92 |
~~~~
|
93 |
|
94 |
+
It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
|
95 |
+
Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
|
96 |
+
|
97 |
---
|
98 |
|
99 |
# Tokenization
|