Update README.md
Browse files
README.md
CHANGED
@@ -8,18 +8,38 @@ pipeline_tag: image-text-to-text
|
|
8 |
library_name: transformers
|
9 |
---
|
10 |
|
11 |
-
# PDS-DPO-7B Model Card
|
12 |
|
13 |
GitHub | arXiv
|
14 |
|
15 |
-
PDS-DPO-7B is
|
16 |
|
17 |
## Model Details
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
## Key Features
|
20 |
-
-
|
21 |
-
-
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## Examples
|
|
|
|
|
24 |
|
25 |
-
## Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
library_name: transformers
|
9 |
---
|
10 |
|
11 |
+
# PDS-DPO-7B-LoRA Model Card
|
12 |
|
13 |
GitHub | arXiv
|
14 |
|
15 |
+
PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.
|
16 |
|
17 |
## Model Details
|
18 |
+
- Model Name: PDS-DPO-7B-LoRA
|
19 |
+
- Base Model: LLaVA 1.5 (Vicuna-7B)
|
20 |
+
- Framework: Preference Data Synthetic Alignment using Direct Preference Optimization (PDS-DPO)
|
21 |
+
- Dataset: 9K synthetic image-text pairs (positive and negative responses), generated via Stable Diffusion, LLaVA, and scored by reward models like ImageReward and Llama-3-8B-ArmoRM.
|
22 |
+
- Training Hardware: 2 × A100 GPUs
|
23 |
+
- Training Optimization: LoRA fine-tuning
|
24 |
|
25 |
## Key Features
|
26 |
+
- Synthetic Data Alignment
|
27 |
+
- Utilizes generative models and leverages reward models for quality control, filtering the best images and responses to align with human preferences.
|
28 |
+
- Improved Hallucination Control
|
29 |
+
- Achieves significant reduction in hallucination rates on benchmarks like Object HalBench, MMHal-Bench, and POPE.
|
30 |
+
- Competitive Benchmark Performance
|
31 |
+
- Demonstrates strong results across vision-language tasks like VQAv2, SQA, MM-Vet, and TextVQA.
|
32 |
|
33 |
## Examples
|
34 |
+
<img src="./images-1.png" alt="fig-1" width="45%"/>
|
35 |
+
<img src="./images-2.png" alt="fig-2" width="90%"/>
|
36 |
|
37 |
+
## Citation
|
38 |
+
```bibtex
|
39 |
+
@article{2024pdsdpo
|
40 |
+
title={Multimodal Preference Data Synthetic Alignment with Reward Model},
|
41 |
+
author={},
|
42 |
+
journal={},
|
43 |
+
year={}
|
44 |
+
}
|
45 |
+
```
|