DongfuJiang
commited on
Commit
•
601136d
1
Parent(s):
e1f3b59
Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ language:
|
|
27 |
|
28 |
## Summary
|
29 |
|
30 |
-
- Mantis is
|
31 |
- Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
|
32 |
- Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.
|
33 |
|
@@ -58,10 +58,11 @@ image2 = "image2.jpg"
|
|
58 |
images = [Image.open(image1), Image.open(image2)]
|
59 |
|
60 |
# load processor and model
|
61 |
-
from mantis.models.mllava import MLlavaProcessor, LlavaForConditionalGeneration
|
62 |
-
|
|
|
63 |
attn_implementation = None # or "flash_attention_2"
|
64 |
-
model =
|
65 |
|
66 |
generation_kwargs = {
|
67 |
"max_new_tokens": 1024,
|
|
|
27 |
|
28 |
## Summary
|
29 |
|
30 |
+
- Mantis-Fuyu is a Fuyu based LMM with **interleaved text and image as inputs**, train on Mantis-Instruct under academic-level resources (i.e. 36 hours on 16xA100-40G).
|
31 |
- Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
|
32 |
- Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.
|
33 |
|
|
|
58 |
images = [Image.open(image1), Image.open(image2)]
|
59 |
|
60 |
# load processor and model
|
61 |
+
# from mantis.models.mllava import MLlavaProcessor, LlavaForConditionalGeneration
|
62 |
+
from mantis.models.mfuyu import MFuyuForCausalLM, MFuyuProcessor
|
63 |
+
processor = MFuyuProcessor.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu")
|
64 |
attn_implementation = None # or "flash_attention_2"
|
65 |
+
model = MFuyuForCausalLM.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu", device_map="cuda", torch_dtype=torch.bfloat16, attn_implementation=attn_implementation)
|
66 |
|
67 |
generation_kwargs = {
|
68 |
"max_new_tokens": 1024,
|