void0721 commited on
Commit
38cb900
·
1 Parent(s): 08631f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -12,7 +12,7 @@ Try out our [web demo 🚀](http://imagebind-llm.opengvlab.com/) here!
12
 
13
  ## Introduction
14
 
15
- We present $\color{goldenrod}{SPHINX}$, a versatile multi-modal large language model (MLLM) with a mixer of training tasks, data domains, and visual embeddings.
16
 
17
  - **Task Mix.** For all-purpose capabilities, we mix a variety of vision-language tasks for mutual improvement: VQA, REC, REG, OCR, etc.
18
 
@@ -21,24 +21,32 @@ We present $\color{goldenrod}{SPHINX}$, a versatile multi-modal large language m
21
  - **Domain Mix.** For data from real-world and synthetic domains, we mix the weights of two domain-specific models for complementarity.
22
 
23
  <p align="left">
24
- <img src="figs/pipeline1.png"/ width="60%"> <br>
25
  </p>
26
  <p align="left">
27
- <img src="figs/pipeline2.png"/ width="60%"> <br>
28
  </p>
29
 
30
  ## Result
 
 
31
  <p align="left">
32
- <img src="figs/table1.png"/ width="50%"> <br>
33
  </p>
 
 
34
  <p align="left">
35
- <img src="figs/table2.png"/ width="50%"> <br>
36
- </p>
 
 
37
  <p align="left">
38
- <img src="figs/table3.png"/ width="50%"> <br>
39
  </p>
 
 
40
  <p align="left">
41
- <img src="figs/table4.png"/ width="50%"> <br>
42
  </p>
43
 
44
  ## Inference
 
12
 
13
  ## Introduction
14
 
15
+ We present SPHINX, a versatile multi-modal large language model (MLLM) with a mixer of training tasks, data domains, and visual embeddings.
16
 
17
  - **Task Mix.** For all-purpose capabilities, we mix a variety of vision-language tasks for mutual improvement: VQA, REC, REG, OCR, etc.
18
 
 
21
  - **Domain Mix.** For data from real-world and synthetic domains, we mix the weights of two domain-specific models for complementarity.
22
 
23
  <p align="left">
24
+ <img src="figs/pipeline1.png"/ width="100%"> <br>
25
  </p>
26
  <p align="left">
27
+ <img src="figs/pipeline2.png"/ width="100%"> <br>
28
  </p>
29
 
30
  ## Result
31
+
32
+ **Evaluation Prompt Design**
33
  <p align="left">
34
+ <img src="figs/table1.png"/ width="100%"> <br>
35
  </p>
36
+
37
+ **Benchmarks on Multimodal Large Language Models**
38
  <p align="left">
39
+ <img src="figs/table2.png"/ width="100%"> <br>
40
+ </p
41
+
42
+ **Visual Question Answering**
43
  <p align="left">
44
+ <img src="figs/table3.png"/ width="100%"> <br>
45
  </p>
46
+
47
+ **Visual Grounding**
48
  <p align="left">
49
+ <img src="figs/table4.png"/ width="100%"> <br>
50
  </p>
51
 
52
  ## Inference