DongfuJiang commited on
Commit
601136d
1 Parent(s): e1f3b59

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -27,7 +27,7 @@ language:
27
 
28
  ## Summary
29
 
30
- - Mantis is an LLaMA-3 based LMM with **interleaved text and image as inputs**, train on Mantis-Instruct under academic-level resources (i.e. 36 hours on 16xA100-40G).
31
  - Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
32
  - Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.
33
 
@@ -58,10 +58,11 @@ image2 = "image2.jpg"
58
  images = [Image.open(image1), Image.open(image2)]
59
 
60
  # load processor and model
61
- from mantis.models.mllava import MLlavaProcessor, LlavaForConditionalGeneration
62
- processor = MLlavaProcessor.from_pretrained("TIGER-Lab/Mantis-8B-siglip-llama3")
 
63
  attn_implementation = None # or "flash_attention_2"
64
- model = LlavaForConditionalGeneration.from_pretrained("TIGER-Lab/Mantis-8B-siglip-llama3", device_map="cuda", torch_dtype=torch.bfloat16, attn_implementation=attn_implementation)
65
 
66
  generation_kwargs = {
67
  "max_new_tokens": 1024,
 
27
 
28
  ## Summary
29
 
30
+ - Mantis-Fuyu is a Fuyu based LMM with **interleaved text and image as inputs**, train on Mantis-Instruct under academic-level resources (i.e. 36 hours on 16xA100-40G).
31
  - Mantis is trained to have multi-image skills including co-reference, reasoning, comparing, temporal understanding.
32
  - Mantis reaches the state-of-the-art performance on five multi-image benchmarks (NLVR2, Q-Bench, BLINK, MVBench, Mantis-Eval), and also maintain a strong single-image performance on par with CogVLM and Emu2.
33
 
 
58
  images = [Image.open(image1), Image.open(image2)]
59
 
60
  # load processor and model
61
+ # from mantis.models.mllava import MLlavaProcessor, LlavaForConditionalGeneration
62
+ from mantis.models.mfuyu import MFuyuForCausalLM, MFuyuProcessor
63
+ processor = MFuyuProcessor.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu")
64
  attn_implementation = None # or "flash_attention_2"
65
+ model = MFuyuForCausalLM.from_pretrained("TIGER-Lab/Mantis-8B-Fuyu", device_map="cuda", torch_dtype=torch.bfloat16, attn_implementation=attn_implementation)
66
 
67
  generation_kwargs = {
68
  "max_new_tokens": 1024,