VictorSanh
commited on
Commit
•
b7e1543
1
Parent(s):
791b16d
tips of memory gpu
Browse files
README.md
CHANGED
@@ -213,6 +213,12 @@ print(generated_texts)
|
|
213 |
|
214 |
# Model optimizations
|
215 |
|
|
|
|
|
|
|
|
|
|
|
|
|
216 |
**Using Flash-attention 2 to speed up generation**
|
217 |
|
218 |
<details><summary>Click to expand.</summary>
|
|
|
213 |
|
214 |
# Model optimizations
|
215 |
|
216 |
+
**Vision encoder efficiency**
|
217 |
+
|
218 |
+
Given the high resolution supported, the vision part of the model can be memory hungry depending on your configuration. If you are GPU-memory-constrained, you can:
|
219 |
+
- **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
|
220 |
+
- **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
|
221 |
+
|
222 |
**Using Flash-attention 2 to speed up generation**
|
223 |
|
224 |
<details><summary>Click to expand.</summary>
|