HuggingFaceM4
/

idefics2-8b-base

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

VictorSanh commited on Apr 15, 2024

Commit

b7e1543

·

verified ·

1 Parent(s): 791b16d

tips of memory gpu

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -213,6 +213,12 @@ print(generated_texts)
 # Model optimizations
 **Using Flash-attention 2 to speed up generation**
 <details><summary>Click to expand.</summary>

 # Model optimizations
+**Vision encoder efficiency**
+Given the high resolution supported, the vision part of the model can be memory hungry depending on your configuration. If you are GPU-memory-constrained, you can:
+- **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
+- **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
 **Using Flash-attention 2 to speed up generation**
 <details><summary>Click to expand.</summary>