bczhou commited on
Commit
b297db8
·
verified ·
1 Parent(s): b5cab9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +164 -1
README.md CHANGED
@@ -12,4 +12,167 @@ tags:
12
  - vision-language
13
  - llm
14
  - lmm
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - vision-language
13
  - llm
14
  - lmm
15
+ ---
16
+ <h2 align="center"> <a href="https://arxiv.org/abs/2402.14289">TinyLLaVA: A Framework of Small-scale Large Multimodal Models</a>
17
+
18
+ <h5 align="center">
19
+
20
+ [![hf_space](https://img.shields.io/badge/🤗-%20Open%20In%20HF-blue.svg)](https://huggingface.co/bczhou/TinyLLaVA-3.1B) [![arXiv](https://img.shields.io/badge/Arxiv-2402.14289-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2402.14289) [![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/LICENSE)
21
+
22
+
23
+ ## &#x1F389; News
24
+ * **[2024.02.25]** Update evaluation scripts and docs!
25
+ * **[2024.02.25]** Data descriptions out. Release TinyLLaVA-1.5B and TinyLLaVA-2.0B!
26
+ * **[2024.02.24]** Example code on inference and model loading added!
27
+ * **[2024.02.23]** Evaluation code and scripts released!
28
+ * **[2024.02.21]** Creating the [TinyLLaVABench](https://github.com/DLCV-BUAA/TinyLLavaBench) repository on GitHub!
29
+ * **[2024.02.21]** Our paper: [TinyLLaVA: A Framework of Small-scale Large Multimodal Models](https://arxiv.org/abs/2402.14289) is out!
30
+ * **[2024.01.11]** Our fist model [TinyLLaVA-1.4B](https://huggingface.co/bczhou/tiny-llava-v1-hf) is out!
31
+
32
+ ## &#x231B; TODO
33
+ - [ ] Add support for Ollama and llama.cpp.
34
+ - [ ] Developers' guide / How to build demo locally.
35
+ - [x] Model Zoo descriptions.
36
+ - [x] Examples and inference.
37
+ - [x] Release code for training.
38
+ - [x] Add descriptions for evaluation.
39
+ - [x] Add descriptions for data preparation.
40
+ - [x] Release TinyLLaVA-1.5B and TinyLLaVA-2.0B.
41
+ - [x] Release TinyLLaVA-3.1B.
42
+ - [x] Release the evaluation code and weights today(2024.2.23).
43
+ ### &#x1F525; High performance, but with fewer parameters
44
+
45
+ - Our best model, TinyLLaVA-3.1B, achieves better overall performance against existing 7B models such as LLaVA-1.5 and Qwen-VL.
46
+
47
+ ## &#x1F433; Model Zoo
48
+ ### Legacy Model
49
+ > https://huggingface.co/bczhou/tiny-llava-v1-hf
50
+
51
+ ### Pretrained Model
52
+ - [TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B)
53
+ - [TinyLLaVA-2.0B](https://huggingface.co/bczhou/TinyLLaVA-2.0B)
54
+ - [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B)
55
+
56
+ ### Model Zoo
57
+ | Name | LLM | Checkpoint | LLaVA-Bench-Wild | MME | MMBench | MM-Vet | SQA-image | VQA-v2 | GQA | TextVQA |
58
+ |---------------|-------------------|------------------------------------------------|------------------|----------|---------|--------|-----------|--------|-------|---------|
59
+ | TinyLLaVA-3.1B | Phi-2 | [TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B) | 75.8 | 1464.9 | 66.9 | 32.0 | 69.1 | 79.9 | 62.0 | 59.1 |
60
+ | TinyLLaVA-2.0B | StableLM-2-1.6B | [TinyLLaVA-2.0B](https://huggingface.co/bczhou/TinyLLaVA-2.0B) | 66.4 | 1433.8 | 63.3 | 32.6 | 64.7 | 78.9 | 61.9 | 56.4 |
61
+ | TinyLLaVA-1.5B | TinyLlama | [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.8 | 1276.5 | 55.2 | 25.8 | 60.3 | 76.9 | 60.3 | 51.7 |
62
+
63
+
64
+
65
+ ## &#x1F527; Requirements and Installation
66
+
67
+ We recommend the requirements as follows.
68
+
69
+ 1. Clone this repository and navigate to LLaVA folder
70
+ ```bash
71
+ git clone https://github.com/DLCV-BUAA/TinyLLaVABench.git
72
+ cd TinyLLaVABench
73
+ ```
74
+
75
+ 2. Install Package
76
+ ```Shell
77
+ conda create -n tinyllava python=3.10 -y
78
+ conda activate tinyllava
79
+ pip install --upgrade pip # enable PEP 660 support
80
+ pip install -e .
81
+ ```
82
+
83
+ 3. Install additional packages for training cases
84
+ ```Shell
85
+ pip install -e ".[train]"
86
+ pip install flash-attn --no-build-isolation
87
+ ```
88
+ ### Upgrade to latest code base
89
+
90
+ ```Shell
91
+ git pull
92
+ pip install -e .
93
+
94
+ # if you see some import errors when you upgrade, please try running the command below (without #)
95
+ # pip install flash-attn --no-build-isolation --no-cache-dir
96
+ ```
97
+
98
+
99
+ ## &#x1F527; Quick Start
100
+
101
+ <details>
102
+ <summary>Load model</summary>
103
+
104
+ ```Python
105
+ from tinyllava.model.builder import load_pretrained_model
106
+ from tinyllava.mm_utils import get_model_name_from_path
107
+ from tinyllava.eval.run_tiny_llava import eval_model
108
+
109
+ model_path = "bczhou/TinyLLaVA-3.1B"
110
+
111
+ tokenizer, model, image_processor, context_len = load_pretrained_model(
112
+ model_path=model_path,
113
+ model_base=None,
114
+ model_name=get_model_name_from_path(model_path)
115
+ )
116
+ ```
117
+ </details>
118
+
119
+ ## &#x1F527; Run Inference
120
+ Here's an example of running inference with [TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B)
121
+ <details>
122
+ <summary>Run Inference</summary>
123
+
124
+ ```Python
125
+ from tinyllava.model.builder import load_pretrained_model
126
+ from tinyllava.mm_utils import get_model_name_from_path
127
+ from tinyllava.eval.run_tiny_llava import eval_model
128
+
129
+ model_path = "bczhou/TinyLLaVA-3.1B"
130
+ prompt = "What are the things I should be cautious about when I visit here?"
131
+ image_file = "https://llava-vl.github.io/static/images/view.jpg"
132
+
133
+ args = type('Args', (), {
134
+ "model_path": model_path,
135
+ "model_base": None,
136
+ "model_name": get_model_name_from_path(model_path),
137
+ "query": prompt,
138
+ "conv_mode": "phi",
139
+ "image_file": image_file,
140
+ "sep": ",",
141
+ "temperature": 0,
142
+ "top_p": None,
143
+ "num_beams": 1,
144
+ "max_new_tokens": 512
145
+ })()
146
+
147
+ eval_model(args)
148
+ ```
149
+ </details>
150
+
151
+ ### Important
152
+ We use different `conv_mode` for different models. Replace the `conv_mode` in `args` according to this table:
153
+ | model | conv_mode |
154
+ |-------------------|---------------|
155
+ | TinyLLaVA-3.1B | phi |
156
+ | TinyLLaVA-2.0B | phi |
157
+ | TinyLLaVA-1.5B | v1 |
158
+
159
+ ## Evaluation
160
+ To ensure the reproducibility, we evaluate the models with greedy decoding.
161
+
162
+ See [Evaluation.md](https://github.com/DLCV-BUAA/TinyLLaVABench/blob/main/docs/Evaluation.md)
163
+
164
+
165
+ ## &#x270F; Citation
166
+
167
+ If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
168
+
169
+ ```BibTeX
170
+ @misc{zhou2024tinyllava,
171
+ title={TinyLLaVA: A Framework of Small-scale Large Multimodal Models},
172
+ author={Baichuan Zhou and Ying Hu and Xi Weng and Junlong Jia and Jie Luo and Xien Liu and Ji Wu and Lei Huang},
173
+ year={2024},
174
+ eprint={2402.14289},
175
+ archivePrefix={arXiv},
176
+ primaryClass={cs.LG}
177
+ }
178
+ ```