sujitvasanth YouLiXiya commited on
Commit
ca95887
·
verified ·
0 Parent(s):

Duplicate from YouLiXiya/tinyllava-v1.0-1.1b-hf

Browse files

Co-authored-by: Youli <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ pipeline_tag: image-to-text
5
+ inference: false
6
+ arxiv: 2304.08485
7
+ license: apache-2.0
8
+ ---
9
+ # LLaVA Model Card
10
+
11
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/FPshq08TKYD0e-qwPLDVO.png)
12
+
13
+ Below is the model card of TinyLlava model 1.1b.
14
+
15
+ Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1XtdA_UoyNzqiEYVR-iWA-xmit8Y2tKV2#scrollTo=DFVZgElEQk3x)
16
+
17
+
18
+ ## Model details
19
+
20
+ **Model type:**
21
+ TinyLLaVA is an open-source chatbot trained by fine-tuning TinyLlama on GPT-generated multimodal instruction-following data.
22
+ It is an auto-regressive language model, based on the transformer architecture.
23
+
24
+ **Paper or resources for more information:**
25
+ https://llava-vl.github.io/
26
+
27
+ ## How to use the model
28
+
29
+ First, make sure to have `transformers >= 4.35.3`.
30
+ The model supports multi-image and multi-prompt generation. Meaning that you can pass multiple images in your prompt. Make sure also to follow the correct prompt template (`USER: xxx\nASSISTANT:`) and add the token `<image>` to the location where you want to query images:
31
+
32
+ ### Using `pipeline`:
33
+
34
+ Below we used [`"YouLiXiya/tinyllava-v1.0-1.1b-hf"`](https://huggingface.co/YouLiXiya/tinyllava-v1.0-1.1b-hf) checkpoint.
35
+
36
+ ```python
37
+ from transformers import pipeline
38
+ from PIL import Image
39
+ import requests
40
+
41
+ model_id = "YouLiXiya/tinyllava-v1.0-1.1b-hf"
42
+ pipe = pipeline("image-to-text", model=model_id)
43
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
44
+
45
+ image = Image.open(requests.get(url, stream=True).raw)
46
+ prompt = "USER: <image>\nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nASSISTANT:"
47
+
48
+ outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
49
+ print(outputs)
50
+ {'generated_text': 'USER: \nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nASSISTANT: The label 15 represents lava, which is the type of rock that is formed from molten magma. '}
51
+ ```
52
+
53
+ ### Using pure `transformers`:
54
+
55
+ Below is an example script to run generation in `float16` precision on a GPU device:
56
+
57
+ ```python
58
+ import requests
59
+ from PIL import Image
60
+
61
+ import torch
62
+ from transformers import AutoProcessor, LlavaForConditionalGeneration
63
+
64
+ model_id = "YouLiXiya/tinyllava-v1.0-1.1b-hf"
65
+
66
+ prompt = "USER: <image>\nWhat are these?\nASSISTANT:"
67
+ image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
68
+
69
+ model = LlavaForConditionalGeneration.from_pretrained(
70
+ model_id,
71
+ torch_dtype=torch.float16,
72
+ low_cpu_mem_usage=True,
73
+ ).to(0)
74
+
75
+ processor = AutoProcessor.from_pretrained(model_id)
76
+
77
+ raw_image = Image.open(requests.get(image_file, stream=True).raw)
78
+ inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
79
+
80
+ output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
81
+ print(processor.decode(output[0][2:], skip_special_tokens=True))
82
+ ```
83
+
84
+ ### Model optimization
85
+
86
+ #### 4-bit quantization through `bitsandbytes` library
87
+
88
+ First make sure to install `bitsandbytes`, `pip install bitsandbytes` and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with:
89
+
90
+ ```diff
91
+ model = LlavaForConditionalGeneration.from_pretrained(
92
+ model_id,
93
+ torch_dtype=torch.float16,
94
+ low_cpu_mem_usage=True,
95
+ + load_in_4bit=True
96
+ )
97
+ ```
98
+
99
+ #### Use Flash-Attention 2 to further speed-up generation
100
+
101
+ First make sure to install `flash-attn`. Refer to the [original repository of Flash Attention](https://github.com/Dao-AILab/flash-attention) regarding that package installation. Simply change the snippet above with:
102
+
103
+ ```diff
104
+ model = LlavaForConditionalGeneration.from_pretrained(
105
+ model_id,
106
+ torch_dtype=torch.float16,
107
+ low_cpu_mem_usage=True,
108
+ + use_flash_attention_2=True
109
+ ).to(0)
110
+ ```
111
+
112
+ ## License
113
+ Llama 2 is licensed under the LLAMA 2 Community License,
114
+ Copyright (c) Meta Platforms, Inc. All Rights Reserved.
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "<image>": 32000,
3
+ "<pad>": 32001
4
+ }
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlavaForConditionalGeneration"
4
+ ],
5
+ "ignore_index": -100,
6
+ "image_token_index": 32000,
7
+ "model_type": "llava",
8
+ "pad_token_id": 32001,
9
+ "projector_hidden_act": "gelu",
10
+ "text_config": {
11
+ "_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-V1.0",
12
+ "architectures": [
13
+ "LlamaForCausalLM"
14
+ ],
15
+ "hidden_size": 2048,
16
+ "intermediate_size": 5632,
17
+ "model_type": "llama",
18
+ "num_hidden_layers": 22,
19
+ "num_key_value_heads": 4,
20
+ "rms_norm_eps": 1e-05,
21
+ "torch_dtype": "bfloat16",
22
+ "vocab_size": 32064
23
+ },
24
+ "torch_dtype": "float16",
25
+ "transformers_version": "4.37.0.dev0",
26
+ "vision_config": {
27
+ "dropout": 0.0,
28
+ "hidden_size": 1024,
29
+ "image_size": 224,
30
+ "intermediate_size": 4096,
31
+ "model_type": "clip_vision_model",
32
+ "num_attention_heads": 16,
33
+ "num_hidden_layers": 24,
34
+ "patch_size": 14,
35
+ "projection_dim": 768
36
+ },
37
+ "vision_feature_layer": -2,
38
+ "vision_feature_select_strategy": "default",
39
+ "vocab_size": 32064
40
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 32001,
6
+ "transformers_version": "4.37.0.dev0"
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d151cbb50192fde78dec713c2cd15d68c73183aab3be97ebbb2f5865bac08611
3
+ size 2819651632
preprocessor_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": true,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.48145466,
13
+ 0.4578275,
14
+ 0.40821073
15
+ ],
16
+ "image_processor_type": "CLIPImageProcessor",
17
+ "image_std": [
18
+ 0.26862954,
19
+ 0.26130258,
20
+ 0.27577711
21
+ ],
22
+ "processor_class": "LlavaProcessor",
23
+ "resample": 3,
24
+ "rescale_factor": 0.00392156862745098,
25
+ "size": {
26
+ "shortest_edge": 224
27
+ }
28
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<pad>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "32000": {
30
+ "content": "<image>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "32001": {
38
+ "content": "<pad>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ }
45
+ },
46
+ "bos_token": "<s>",
47
+ "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
48
+ "clean_up_tokenization_spaces": false,
49
+ "eos_token": "</s>",
50
+ "legacy": false,
51
+ "model_max_length": 2048,
52
+ "pad_token": "<pad>",
53
+ "padding_side": "right",
54
+ "processor_class": "LlavaProcessor",
55
+ "sp_model_kwargs": {},
56
+ "tokenizer_class": "LlamaTokenizer",
57
+ "unk_token": "<unk>",
58
+ "use_default_system_prompt": false
59
+ }