VictorSanh commited on
Commit
f203285
1 Parent(s): 65cea8f
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -170,10 +170,12 @@ print(generated_texts)
170
 
171
  </details>
172
 
173
- **For `idefics2-8b`**
174
 
175
  <details><summary>Click to expand.</summary>
176
 
 
 
177
  ```python
178
  processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
179
  model = AutoModelForVision2Seq.from_pretrained(
@@ -218,6 +220,39 @@ print(generated_texts)
218
 
219
  </details>
220
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
  # Model optimizations
222
 
223
  If your GPU allows, we first recommend loading (and running inference) in half precision (`torch.float16` or `torch.bfloat16`).
 
170
 
171
  </details>
172
 
173
+ **For `idefics2-8b` and `idefics2-8b-chatty`**
174
 
175
  <details><summary>Click to expand.</summary>
176
 
177
+ `idefics2-8b` and `idefics2-8b-chatty` share the same API. Modifying the `from_pretrained` call to select the correct checkpoint is sufficient.
178
+
179
  ```python
180
  processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
181
  model = AutoModelForVision2Seq.from_pretrained(
 
220
 
221
  </details>
222
 
223
+ **Text generation inference**
224
+
225
+ Idefics2 is integrated into [TGI](https://github.com/huggingface/text-generation-inference) and we host API endpoints for both `idefics2-8b` and `idefics2-8b-chatty`.
226
+
227
+ Multiple images can be passed on with the markdown syntax (`![](IMAGE_URL)`) and no spaces are required before and after. The dialogue utterances can be separated with `<end_of_utterance>\n` followed by `User:` or `Assistant:`. `User:` is followed by a space if the following characters are real text (no space if followed by an image).
228
+
229
+ <details><summary>Click to expand.</summary>
230
+
231
+ ```python
232
+ from text_generation import Client
233
+
234
+ API_TOKEN="<YOUR_API_TOKEN>"
235
+ API_URL = "https://api-inference.huggingface.co/models/HuggingFaceM4/idefics2-8b-chatty"
236
+
237
+ # System prompt used in the playground for `idefics2-8b-chatty`
238
+ SYSTEM_PROMPT = "System: The following is a conversation between Idefics2, a highly knowledgeable and intelligent visual AI assistant created by Hugging Face, referred to as Assistant, and a human user called User. In the following interactions, User and Assistant will converse in natural language, and Assistant will do its best to answer User’s questions. Assistant has the ability to perceive images and reason about them, but it cannot generate images. Assistant was built to be respectful, polite and inclusive. It knows a lot, and always tells the truth. When prompted with an image, it does not make up facts.<end_of_utterance>\nAssistant: Hello, I'm Idefics2, Huggingface's latest multimodal assistant. How can I help you?<end_of_utterance>\n"
239
+ QUERY = "User:![](https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg)Describe this image.<end_of_utterance>\nAssistant:"
240
+
241
+ client = Client(
242
+ base_url=API_URL,
243
+ headers={"x-use-cache": "0", "Authorization": f"Bearer {API_TOKEN}"},
244
+ )
245
+ generation_args = {
246
+ "max_new_tokens": 512,
247
+ "repetition_penalty": 1.1,
248
+ "do_sample": False,
249
+ }
250
+ generated_text = client.generate(prompt=SYSTEM_PROMPT + QUERY, **generation_args)
251
+ generated_text
252
+ ```
253
+
254
+ </details>
255
+
256
  # Model optimizations
257
 
258
  If your GPU allows, we first recommend loading (and running inference) in half precision (`torch.float16` or `torch.bfloat16`).