alvarobartt HF staff commited on
Commit
18620bc
·
verified ·
1 Parent(s): 5c76be9

Fix typos and include dependencies to install

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -102,7 +102,7 @@ Magma is a multimodal agentic AI model that can generate text based on the input
102
 
103
  ### Highlights
104
  * **Digital and Physical Worlds:** Magma is the first-ever foundation model for multimodal AI agents, designed to handle complex interactions across both virtual and real environments!
105
- * **Versatile Capabilities:** Magma as a single model not only posseesses generic image and videos understanding ability, but alse generate goal-driven visual plans and actions, making it versatile for different agentic tasks!
106
  * **State-of-the-art Performance:** Magma achieves state-of-the-art performance on various multimodal tasks, including UI navigation, robotics manipulation, as well as generic image and video understanding, in particular the spatial understanding and reasoning!
107
  * **Scalable Pretraining Strategy:** Magma is designed to be **learned scalably from unlabeled videos** in the wild in addition to the existing agentic data, making it strong generalization ability and suitable for real-world applications!
108
 
@@ -125,15 +125,20 @@ The model is developed by Microsoft and is funded by Microsoft Research. The mod
125
 
126
  <!-- {{ get_started_code | default("[More Information Needed]", true)}} -->
127
 
128
- Use the code below to get started with the model.
 
 
 
 
 
 
129
 
130
  ```python
131
  import torch
132
  from PIL import Image
133
  import requests
134
 
135
- from transformers import AutoModelForCausalLM
136
- from transformers import AutoProcessor
137
 
138
  # Load the model and processor
139
  model = AutoModelForCausalLM.from_pretrained("microsoft/Magma-8B", trust_remote_code=True)
@@ -159,7 +164,6 @@ with torch.inference_mode():
159
 
160
  generate_ids = generate_ids[:, inputs["input_ids"].shape[-1] :]
161
  response = processor.decode(generate_ids[0], skip_special_tokens=True).strip()
162
-
163
  print(response)
164
  ```
165
 
 
102
 
103
  ### Highlights
104
  * **Digital and Physical Worlds:** Magma is the first-ever foundation model for multimodal AI agents, designed to handle complex interactions across both virtual and real environments!
105
+ * **Versatile Capabilities:** Magma as a single model not only possesses generic image and videos understanding ability, but also generate goal-driven visual plans and actions, making it versatile for different agentic tasks!
106
  * **State-of-the-art Performance:** Magma achieves state-of-the-art performance on various multimodal tasks, including UI navigation, robotics manipulation, as well as generic image and video understanding, in particular the spatial understanding and reasoning!
107
  * **Scalable Pretraining Strategy:** Magma is designed to be **learned scalably from unlabeled videos** in the wild in addition to the existing agentic data, making it strong generalization ability and suitable for real-world applications!
108
 
 
125
 
126
  <!-- {{ get_started_code | default("[More Information Needed]", true)}} -->
127
 
128
+ To get started with the model, you first need to make sure that `transformers` and `torch` are installed, as well as installing the following dependencies:
129
+
130
+ ```bash
131
+ pip install torchvision Pillow open_clip_torch
132
+ ```
133
+
134
+ Then you can run the following code:
135
 
136
  ```python
137
  import torch
138
  from PIL import Image
139
  import requests
140
 
141
+ from transformers import AutoModelForCausalLM, AutoProcessor
 
142
 
143
  # Load the model and processor
144
  model = AutoModelForCausalLM.from_pretrained("microsoft/Magma-8B", trust_remote_code=True)
 
164
 
165
  generate_ids = generate_ids[:, inputs["input_ids"].shape[-1] :]
166
  response = processor.decode(generate_ids[0], skip_special_tokens=True).strip()
 
167
  print(response)
168
  ```
169