--- license: apache-2.0 datasets: - openbmb/RLAIF-V-Dataset language: - en --- # Model Card for RLAIF-V [GitHub ](https://github.com/RLHF-V/RLAIF-V) | [Paper](https://arxiv.org/abs/2405.17220) **RLAIF-V-7B** is trained based on LLaVA 1.5 7B with the novel [RLAIF-V](https://github.com/RLHF-V/RLAIF-V) framework. By aligning with human preference via large scale [AI feedback](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset), the model achieves **super GPT-4V trustworthiness**. RLAIF-V maximally exploits the open-source feedback from two key perspectives, including high-quality feedback data and an online feedback learning algorithm. ## Model Details ### Key Features * 📈 **Most trustworthy LLaVA 1.5**: By learning from open-source AI feedback, specifically, the feedback from LLaVA-NeXT-34B, RLAIF-V-7B achieves the best trustworthiness improvement on LLaVA-v1.5 compared to other hallucination reduction methods. * 💪 **Maintaining Well Performance on General Abilities**: On benchmarks evaluating general capabilities (e.g. MMStar), RLAIF-V-7B also exhibits good performance. * 🚀 **Inference-time Scaling by Self-guidance**: Using RLAIF-V 7B as a reward model can further improve model performance on multiple benchmarks with best-of-N selection.