InferenceIllusionist
/

Excalibur-7b-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Excalibur-7b-DPO / README.md

InferenceIllusionist's picture

InferenceIllusionist

Update README.md

f2a7ecb verified 8 months ago

|

1 kB

	---
	base_model:
	- InferenceIllusionist/Excalibur-7b
	library_name: transformers
	tags:
	- finetune
	license: apache-2.0
	datasets:
	- Intel/orca_dpo_pairs
	---


	# Excalibur-7b-DPO

	<img src="https://i.imgur.com/pbPbqq0.jpeg" width="550"/>

	An initial foray into the world of fine-tuning. The goal of this release was to amplify the quality of this model's responses, especially when used in vision use cases*


	### Notes & Methodology
	* [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
	* This is a quick experiment to determine the impact of DPO finetuning on the original base model
	* Ran for a little over an hour on a single A100
	* Internal benchmarks showed improvement over base model, awaiting final results
	* Precision: bfloat16

	*Requires [mistral-7b-mmproj-v1.5-Q4_1](https://huggingface.co/koboldcpp/mmproj/resolve/main/mistral-7b-mmproj-v1.5-Q4_1.gguf?download=true) file to be loaded in Kobold