soldni commited on
Commit
ef36fa4
1 Parent(s): def21c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -19
README.md CHANGED
@@ -2,35 +2,94 @@
2
  license: apache-2.0
3
  language:
4
  - en
5
- tags:
6
- - moe
7
- - olmo
8
- - olmoe
9
- - molmo
10
- - molmoe
11
- co2_eq_emissions: 1
12
  datasets:
13
  - allenai/OLMoE-mix-0924
14
- library_name: transformers
 
 
 
 
 
 
 
15
  ---
16
 
17
- <img alt="Molmo Logo." src="molmo_logo.png" width="250px">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- # Model Summary
 
 
 
 
 
 
20
 
21
- > MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters released in September 2024 (0924) based on [OLMoE-1B-7B-0924](https://huggingface.co/allenai/OLMoE-1B-7B-0924). It yields state-of-the-art performance among multimodal models with a similar size while being fully open-source.
 
 
 
 
22
 
23
- - **Paper:** WIP
24
- - **Code:** WIP
25
 
26
- # Use
 
 
 
 
 
27
 
28
- WIP
 
 
29
 
30
- # Evaluation Snapshot
 
31
 
32
- WIP
 
 
33
 
34
- # Citation
35
 
36
- WIP
 
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ base_model:
6
+ - openai/clip-vit-large-patch14-336
7
+ - allenai/OLMoE-1B-7B-0924
 
 
 
 
8
  datasets:
9
  - allenai/OLMoE-mix-0924
10
+ pipeline_tag: image-text-to-text
11
+ tags:
12
+ - multimodal
13
+ - moe
14
+ - olmo
15
+ - olmoe
16
+ - molmo
17
+ - molmoe
18
  ---
19
 
20
+ <img src="molmo_logo.png" alt="Logo for the Molmo Project" style="width: auto; height: 50px;">
21
+
22
+ # Molmo 1B
23
+
24
+ Molmo is an open vision-language model developed by the Allen Institute for AI. Molmo models are trained on PixMo, a dataset of 1 million, highly-curated image-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo family [here](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19).
25
+
26
+ MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters released in September 2024 (0924) based on [OLMoE-1B-7B-0924](https://huggingface.co/allenai/OLMoE-1B-7B-0924). It yields state-of-the-art performance among multimodal models with a similar size while being fully open-source.
27
+
28
+ This checkpoint is a **preview** of the Molmo release. All artifacts used in creating Molmo (PixMo dataset, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
29
+
30
+ **[Sign up here](https://docs.google.com/forms/d/e/1FAIpQLSdML1MhNNBDsCHpgWG65Oydg2SjZzVasyqlP08nBrWjZp_c7A/viewform)** to be the first to know when artifacts are released.
31
+
32
+
33
+
34
+ ## Quick Start
35
+
36
+ To run MolmoE, first install dependencies:
37
+
38
+ ```bash
39
+ pip install einops tensorflow torchvision
40
+ ```
41
+
42
+ Then, follow these steps:
43
+
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
46
+ from PIL import Image
47
+ import requests
48
+
49
+ # load the processor
50
+ processor = AutoProcessor.from_pretrained(
51
+ 'allenai/MolmoE-1B-0924',
52
+ trust_remote_code=True,
53
+ torch_dtype='auto',
54
+ device_map='auto'
55
+ )
56
 
57
+ # load the model
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ 'allenai/MolmoE-1B-0924',
60
+ trust_remote_code=True,
61
+ torch_dtype='auto',
62
+ device_map='auto'
63
+ )
64
 
65
+ # process the image and text
66
+ inputs = processor.process(
67
+ images=[Image.open(requests.get("https://picsum.photos/id/237/536/354", stream=True).raw)],
68
+ text="Describe this image."
69
+ )
70
 
71
+ # move inputs to the correct device and make a batch of size 1
72
+ inputs = {k: v.to(model.device).unsqueeze(0) for k, v in inputs.items()}
73
 
74
+ # generate output; maximum 200 new tokens; stop generation when <|endoftext|> is generated
75
+ output = model.generate_from_batch(
76
+ inputs,
77
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
78
+ tokenizer=processor.tokenizer
79
+ )
80
 
81
+ # only get generated tokens; decode them to text
82
+ generated_tokens = output[0,inputs['input_ids'].size(1):]
83
+ generated_text = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)
84
 
85
+ # print the generated text
86
+ print(generated_text)
87
 
88
+ # >>> This photograph captures an adorable black Labrador puppy sitting on a weathered
89
+ # wooden deck. The deck's planks, which are a mix of light and dark brown with ...
90
+ ```
91
 
92
+ ## License and Use
93
 
94
+ This model is licensed under Apache 2.0. It is intended for research and educational use.
95
+ For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).