Text Generation
Safetensors
llava_qwen
conversational
weizhiwang commited on
Commit
1f442bf
·
verified ·
1 Parent(s): 5f9bc5d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Lin-Chen/ShareGPT4V
5
+ base_model:
6
+ - Qwen/Qwen2.5-1.5B-Instruct
7
+ - google/siglip-so400m-patch14-384
8
+ pipeline_tag: text-generation
9
+ ---
10
+
11
+
12
+ # MLM-Filter-Qwen2.5-1.5B-GPT4o Model Card
13
+
14
+ ## Model details
15
+
16
+ **Model type:**
17
+ MLM-Filter-Qwen2.5-1.5B-GPT4o is an open-source MLLM trained to assess the data quality of image-text paired data. It can generate 4 quality metrics for image-text data: Image Text Matching, Object Detail Fulfillment, Caption Text Quality, and Semantic Understanding.
18
+
19
+ **Model date:**
20
+ MLM-Filter-Qwen2.5-1.5B-GPT4o was trained in Dec 2024.
21
+
22
+ **Paper or resources for more information:**
23
+ https://mlm-filter.github.io/
24
+
25
+ ```
26
+ @article{wang2024finetuned,
27
+ title={Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters},
28
+ author={Wang, Weizhi and Mrini, Khalil and Yang, Linjie and Kumar, Sateesh and Tian, Yu and Yan, Xifeng and Wang, Heng},
29
+ journal={arXiv preprint arXiv:2403.02677},
30
+ year={2024}
31
+ }
32
+ ```
33
+
34
+ ## License
35
+ Qwen LICENSE AGREEMENT
36
+
37
+ **Where to send questions or comments about the model:**
38
+ https://github.com/Victorwz/MLM_Filter/issues
39
+
40
+ ## Intended use
41
+ **Primary intended uses:**
42
+ MLM-Filter can be used as a drop-in replacement for CLIPScore in these tasks:
43
+
44
+ 1. Score image-text data in large-scale pre-training dataset and then filter high-quality subsets based on the scores (For training MLLMs or VLMs, please consider to jointly use the Image-Text Matching score and the Object Detail Fulfillment score);
45
+
46
+ 2. Evaluate the image-text alignment for image2text or text2image generation models;
47
+
48
+ 3. Any potential applications with the need to calculate the image-text alignment.
49
+
50
+
51
+ ## Training dataset (709K)
52
+ - 665k ShareGPT4V data.
53
+ - 44k instructions on image-text data quality assessment tasks ranging across 4 metrics.
54
+
55
+ ## Usage Sample
56
+ Please follow the instructions in https://github.com/Victorwz/MLM_Filter.