InferenceIllusionist
commited on
Commit
•
aa03379
1
Parent(s):
7cc57ac
Adding previous model scores for comparison
Browse filesAlso slight clean-up and clarification in methodology language
README.md
CHANGED
@@ -125,9 +125,14 @@ An initial foray into the world of fine-tuning. The goal of this release was to
|
|
125 |
|
126 |
## Notes & Methodology
|
127 |
* [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
|
128 |
-
* This is a quick experiment to determine the impact of DPO finetuning on the
|
129 |
* Ran for a little over an hour on a single A100
|
130 |
-
*
|
|
|
|
|
|
|
|
|
|
|
131 |
* Precision: bfloat16
|
132 |
|
133 |
|
|
|
125 |
|
126 |
## Notes & Methodology
|
127 |
* [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
|
128 |
+
* This is a quick experiment to determine the impact of DPO finetuning on the Excelsior-7b base model
|
129 |
* Ran for a little over an hour on a single A100
|
130 |
+
* Fine-tuning succeeded in making model conversational and more well-rounded
|
131 |
+
* Benchmark scores increased in the following categories versus base Excelsior-7b:
|
132 |
+
* ARC: 69.71 -> <b>70.9</b>
|
133 |
+
* HellaSwag: 87.56 -> <b>87.93</b>
|
134 |
+
* TruthfulQA: 67.24 -> <b>70.82</b>
|
135 |
+
* Average: 73.6 -> <b>73.84</b>
|
136 |
* Precision: bfloat16
|
137 |
|
138 |
|