pdsdpo
/

PDS-DPO-7B-LoRA

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

pdsdpo commited on Dec 17, 2024

Commit

ed34500

·

verified ·

1 Parent(s): c8d2764

Update README.md

Files changed (1) hide show

README.md +25 -5

README.md CHANGED Viewed

@@ -8,18 +8,38 @@ pipeline_tag: image-text-to-text
 library_name: transformers
 ---
-# PDS-DPO-7B Model Card
 GitHub | arXiv
-PDS-DPO-7B is trained based on LLaVA 1.5 7B with a novel framework.
 ## Model Details
 ## Key Features
-- Features 1
-- Features 2
 ## Examples
-## Citation

 library_name: transformers
 ---
+# PDS-DPO-7B-LoRA Model Card
 GitHub | arXiv
+PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.
 ## Model Details
+- Model Name: PDS-DPO-7B-LoRA
+- Base Model: LLaVA 1.5 (Vicuna-7B)
+- Framework: Preference Data Synthetic Alignment using Direct Preference Optimization (PDS-DPO)
+- Dataset: 9K synthetic image-text pairs (positive and negative responses), generated via Stable Diffusion, LLaVA, and scored by reward models like ImageReward and Llama-3-8B-ArmoRM.
+- Training Hardware: 2 × A100 GPUs
+- Training Optimization: LoRA fine-tuning
 ## Key Features
+- Synthetic Data Alignment
+  - Utilizes generative models and leverages reward models for quality control, filtering the best images and responses to align with human preferences.
+- Improved Hallucination Control
+  - Achieves significant reduction in hallucination rates on benchmarks like Object HalBench, MMHal-Bench, and POPE.
+- Competitive Benchmark Performance
+  - Demonstrates strong results across vision-language tasks like VQAv2, SQA, MM-Vet, and TextVQA.
 ## Examples
+<img src="./images-1.png" alt="fig-1" width="45%"/>
+<img src="./images-2.png" alt="fig-2" width="90%"/>
+## Citation
+```bibtex
+@article{2024pdsdpo
+title={Multimodal Preference Data Synthetic Alignment with Reward Model},
+author={},
+journal={},
+year={}
+}
+```