Alpha-VLLM
/

SPHINX

void0721 commited on Nov 3, 2023

Commit

38cb900

1 Parent(s): 08631f9

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ Try out our [web demo 🚀](http://imagebind-llm.opengvlab.com/) here!
 ## Introduction
-We present $\color{goldenrod}{SPHINX}$, a versatile multi-modal large language model (MLLM) with a mixer of training tasks, data domains, and visual embeddings.
 - **Task Mix.** For all-purpose capabilities, we mix a variety of vision-language tasks for mutual improvement: VQA, REC, REG, OCR, etc.
@@ -21,24 +21,32 @@ We present $\color{goldenrod}{SPHINX}$, a versatile multi-modal large language m
 - **Domain Mix.** For data from real-world and synthetic domains, we mix the weights of two domain-specific models for complementarity.
 <p align="left">
-  <img src="figs/pipeline1.png"/ width="60%"> <br>
 </p>
 <p align="left">
-  <img src="figs/pipeline2.png"/ width="60%"> <br>
 </p>
 ## Result
 <p align="left">
-  <img src="figs/table1.png"/ width="50%"> <br>
 </p>
 <p align="left">
-  <img src="figs/table2.png"/ width="50%"> <br>
-</p>
 <p align="left">
-  <img src="figs/table3.png"/ width="50%"> <br>
 </p>
 <p align="left">
-  <img src="figs/table4.png"/ width="50%"> <br>
 </p>
 ## Inference

 ## Introduction
+We present SPHINX, a versatile multi-modal large language model (MLLM) with a mixer of training tasks, data domains, and visual embeddings.
 - **Task Mix.** For all-purpose capabilities, we mix a variety of vision-language tasks for mutual improvement: VQA, REC, REG, OCR, etc.
 - **Domain Mix.** For data from real-world and synthetic domains, we mix the weights of two domain-specific models for complementarity.
 <p align="left">
+  <img src="figs/pipeline1.png"/ width="100%"> <br>
 </p>
 <p align="left">
+  <img src="figs/pipeline2.png"/ width="100%"> <br>
 </p>
 ## Result
+**Evaluation Prompt Design**
 <p align="left">
+  <img src="figs/table1.png"/ width="100%"> <br>
 </p>
+**Benchmarks on Multimodal Large Language Models**
 <p align="left">
+  <img src="figs/table2.png"/ width="100%"> <br>
+</p
+**Visual Question Answering**
 <p align="left">
+  <img src="figs/table3.png"/ width="100%"> <br>
 </p>
+**Visual Grounding**
 <p align="left">
+  <img src="figs/table4.png"/ width="100%"> <br>
 </p>
 ## Inference