Safetensors
qwen2
yueqis commited on
Commit
085034d
ยท
verified ยท
1 Parent(s): aaef498

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +201 -0
README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - neulab/PangeaInstruct
5
+ language:
6
+ - am
7
+ - ar
8
+ - bg
9
+ - bn
10
+ - cs
11
+ - de
12
+ - el
13
+ - en
14
+ - es
15
+ - fa
16
+ - fr
17
+ - ga
18
+ - hi
19
+ - id
20
+ - ig
21
+ - it
22
+ - iw
23
+ - ja
24
+ - jv
25
+ - ko
26
+ - nl
27
+ - mn
28
+ - ms
29
+ - no
30
+ - pl
31
+ - pt
32
+ - ro
33
+ - ru
34
+ - si
35
+ - su
36
+ - sw
37
+ - ta
38
+ - te
39
+ - th
40
+ - tr
41
+ - uk
42
+ - ur
43
+ - vi
44
+ - zh
45
+ base_model:
46
+ - Qwen/Qwen2-7B-Instruct
47
+ ---
48
+ # Pangea-7B Model Card
49
+
50
+ [Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages](https://neulab.github.io/Pangea/)
51
+
52
+ ๐Ÿ‡ช๐Ÿ‡น ๐Ÿ‡ธ๐Ÿ‡ฆ ๐Ÿ‡ง๐Ÿ‡ฌ ๐Ÿ‡ง๐Ÿ‡ฉ ๐Ÿ‡จ๐Ÿ‡ฟ ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ฌ๐Ÿ‡ท ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‡ช๐Ÿ‡ธ ๐Ÿ‡ฎ๐Ÿ‡ท ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡ช ๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‡ฎ๐Ÿ‡ฉ ๐Ÿ‡ณ๐Ÿ‡ฌ ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡ฎ๐Ÿ‡ฑ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฎ๐Ÿ‡ฉ ๐Ÿ‡ฐ๐Ÿ‡ท ๐Ÿ‡ณ๐Ÿ‡ฑ ๐Ÿ‡ฒ๐Ÿ‡ณ ๐Ÿ‡ฒ๐Ÿ‡พ ๐Ÿ‡ณ๐Ÿ‡ด ๐Ÿ‡ต๐Ÿ‡ฑ ๐Ÿ‡ต๐Ÿ‡น ๐Ÿ‡ง๐Ÿ‡ท ๐Ÿ‡ท๐Ÿ‡ด ๐Ÿ‡ท๐Ÿ‡บ ๐Ÿ‡ฑ๐Ÿ‡ฐ ๐Ÿ‡ฎ๐Ÿ‡ฉ ๐Ÿ‡ฐ๐Ÿ‡ช ๐Ÿ‡น๐Ÿ‡ฟ ๐Ÿ‡ฑ๐Ÿ‡ฐ ๐Ÿ‡น๐Ÿ‡ญ ๐Ÿ‡น๐Ÿ‡ท ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ต๐Ÿ‡ฐ ๐Ÿ‡ป๐Ÿ‡ณ ๐Ÿ‡จ๐Ÿ‡ณ ๐Ÿ‡น๐Ÿ‡ผ
53
+
54
+ [๐Ÿ  Homepage](https://neulab.github.io/Pangea/) | [๐Ÿค– Pangea-7B](https://huggingface.co/neulab/Pangea-7B) | [๐Ÿ“Š PangeaIns](https://huggingface.co/datasets/neulab/PangeaInstruct) | [๐Ÿงช PangeaBench](https://huggingface.co/collections/neulab/pangea-6713c3b0d78a453906eb2ed8) | [๐Ÿ’ป Github](https://github.com/neulab/Pangea/tree/main) | [๐Ÿ“„ Arxiv](https://arxiv.org/abs/2410.16153) | [๐Ÿ“• PDF](https://arxiv.org/pdf/2410.16153) | [๐Ÿ–ฅ๏ธ Demo](https://huggingface.co/spaces/neulab/Pangea)
55
+
56
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6230d750d93e84e233882dbc/ZjVTKnIsyshWpo-PWg9gM.png" alt="description" style="width:300px;">
57
+
58
+
59
+ ## Model details
60
+
61
+ - **Model:** Pangea is a fully open-source Multilingual Multimodal Multicultural LLM.
62
+ - **Date:** Pangea-7B was trained in 2024.
63
+ - **Training Dataset:** [6M PangeaIns](https://huggingface.co/datasets/neulab/PangeaInstruct).
64
+ - **Architecture:** Pangea-7B follows the architecture of [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), with a [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) backbone.
65
+
66
+ ## Uses
67
+
68
+ Pangea-7B follows the architecture of [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT).
69
+
70
+ You could either (1) follow the same model loading procedures as of [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT), an example of loading Pangea-7B directly is shown in the Python code below, or (2) use our hf version of Pangea-7B: [Pangea-7B-hf]https://huggingface.co/neulab/Pangea-7B-hf
71
+
72
+ ### Direct Use
73
+ First you would need to clone and install [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT).
74
+
75
+ ```bash
76
+ git clone https://github.com/LLaVA-VL/LLaVA-NeXT
77
+ cd LLaVA-NeXT
78
+ pip install -e ".[train]"
79
+ ```
80
+
81
+ Then, you could load Pangea-7B using the following code:
82
+ ```python
83
+ from llava.model.builder import load_pretrained_model
84
+ model_path = 'neulab/Pangea-7B'
85
+ model_name = 'Pangea-7B-qwen'
86
+ args = {"multimodal": True}
87
+ tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name, **args)
88
+ ```
89
+
90
+ Defining some helper functions for using the model:
91
+ ```python
92
+ import torch
93
+ from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
94
+ from llava.utils import disable_torch_init
95
+ from llava.constants import IGNORE_INDEX, DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX
96
+ from typing import Dict
97
+ import transformers
98
+ import re
99
+ from PIL import Image
100
+
101
+ def preprocess_qwen(sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False, max_len=2048, system_message: str = "You are a helpful assistant.") -> Dict:
102
+ roles = {"human": "<|im_start|>user", "gpt": "<|im_start|>assistant"}
103
+ im_start, im_end = tokenizer.additional_special_tokens_ids
104
+ nl_tokens = tokenizer("\n").input_ids
105
+ _system = tokenizer("system").input_ids + nl_tokens
106
+ _user = tokenizer("user").input_ids + nl_tokens
107
+ _assistant = tokenizer("assistant").input_ids + nl_tokens
108
+ input_ids = []
109
+ source = sources
110
+ if roles[source[0]["from"]] != roles["human"]: source = source[1:]
111
+ input_id, target = [], []
112
+ system = [im_start] + _system + tokenizer(system_message).input_ids + [im_end] + nl_tokens
113
+ input_id += system
114
+ target += [im_start] + [IGNORE_INDEX] * (len(system) - 3) + [im_end] + nl_tokens
115
+ assert len(input_id) == len(target)
116
+ for j, sentence in enumerate(source):
117
+ role = roles[sentence["from"]]
118
+ if has_image and sentence["value"] is not None and "<image>" in sentence["value"]:
119
+ num_image = len(re.findall(DEFAULT_IMAGE_TOKEN, sentence["value"]))
120
+ texts = sentence["value"].split('<image>')
121
+ _input_id = tokenizer(role).input_ids + nl_tokens
122
+ for i,text in enumerate(texts):
123
+ _input_id += tokenizer(text).input_ids
124
+ if i<len(texts)-1: _input_id += [IMAGE_TOKEN_INDEX] + nl_tokens
125
+ _input_id += [im_end] + nl_tokens
126
+ assert sum([i==IMAGE_TOKEN_INDEX for i in _input_id])==num_image
127
+ else:
128
+ if sentence["value"] is None: _input_id = tokenizer(role).input_ids + nl_tokens
129
+ else: _input_id = tokenizer(role).input_ids + nl_tokens + tokenizer(sentence["value"]).input_ids + [im_end] + nl_tokens
130
+ input_id += _input_id
131
+ input_ids.append(input_id)
132
+ return torch.tensor(input_ids, dtype=torch.long)
133
+
134
+ def generate_output(prompt, image=None, do_sample=False, temperature=0, top_p=0.5, num_beams=1, max_new_tokens=1024):
135
+ image_tensors = []
136
+ prompt = "<image>\n" + prompt
137
+ image = Image.open(image)
138
+ image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values']
139
+ image_tensors.append(image_tensor.half().cuda())
140
+ input_ids = preprocess_qwen([{'from': 'human', 'value': prompt},{'from': 'gpt','value': None}], tokenizer, has_image=True).cuda()
141
+ with torch.inference_mode():
142
+ output_ids = model.generate(
143
+ input_ids,
144
+ images=image_tensors,
145
+ do_sample=do_sample,
146
+ temperature=temperature,
147
+ top_p=top_p,
148
+ num_beams=num_beams,
149
+ max_new_tokens=max_new_tokens,
150
+ use_cache=True
151
+ )
152
+ outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
153
+ outputs = outputs.strip()
154
+ return outputs
155
+ ```
156
+
157
+ Now, an example of using the model:
158
+ ```python
159
+ prompt = "What did you see in the image?"
160
+ image = "image.png"
161
+ print(generate_output(prompt, image=image))
162
+ ```
163
+
164
+ Note that the example above demonstrates multimodal usage. To use the model with text-only inputs, you would need to reload the model with :
165
+ ```python
166
+ args = {"multimodal": True}
167
+ tokenizer, model, _, context_len = load_pretrained_model(model_path, None, model_name, **args)
168
+
169
+ def generate_output_text_only(prompt, do_sample=False, temperature=0, top_p=0.5, num_beams=1, max_new_tokens=1024):
170
+ input_ids = preprocess_qwen([{'from': 'human', 'value': prompt},{'from': 'gpt','value': None}], tokenizer, has_image=False).cuda()
171
+ with torch.inference_mode():
172
+ generated_ids = model.generate(
173
+ input_ids,
174
+ do_sample=do_sample,
175
+ temperature=temperature,
176
+ top_p=top_p,
177
+ num_beams=num_beams,
178
+ max_new_tokens=max_new_tokens,
179
+ use_cache=True
180
+ )
181
+ generated_ids = [output_ids[len(input_ids) :] for input_ids, output_ids in zip(input_ids, generated_ids)]
182
+ outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
183
+ outputs = outputs.strip()
184
+ return outputs
185
+
186
+ prompt = "Write me a python function that could sort a input integer list by descending order"
187
+ print(generate_output_text_only(prompt))
188
+ ```
189
+ ## Citing the Model
190
+
191
+ **BibTeX Citation:**
192
+
193
+ ```
194
+ @article{yue2024pangeafullyopenmultilingual,
195
+ title={Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages},
196
+ author={Xiang Yue and Yueqi Song and Akari Asai and Seungone Kim and Jean de Dieu Nyandwi and Simran Khanuja and Anjali Kantharuban and Lintang Sutawika and Sathyanarayanan Ramamoorthy and Graham Neubig},
197
+ year={2024},
198
+ journal={arXiv preprint arXiv:2410.16153},
199
+ url={https://arxiv.org/abs/2410.16153}
200
+ }
201
+ ```