File size: 5,218 Bytes
34e7363
7cc57ac
34e7363
31b4e1e
 
deee7e1
 
7cc57ac
 
31b4e1e
 
7cc57ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34e7363
 
 
31b4e1e
34e7363
31b4e1e
34e7363
b35fab6
34e7363
0b8beeb
34e7363
5cdd4d8
31b4e1e
 
85478d1
4f4c3d8
f2a7ecb
a0ebbc7
5cdd4d8
 
 
 
59e5718
 
 
5cdd4d8
d3c70c7
04add3f
 
 
7cc57ac
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
license: apache-2.0
library_name: transformers
tags:
- finetune
- dpo
- chatml
base_model:
- InferenceIllusionist/Excalibur-7b
datasets:
- Intel/orca_dpo_pairs
model-index:
- name: Excalibur-7b-DPO
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 70.9
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 87.93
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 65.46
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 70.82
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 82.48
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 65.43
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=InferenceIllusionist/Excalibur-7b-DPO
      name: Open LLM Leaderboard
---


# Excalibur-7b-DPO

<img src="https://i.imgur.com/pbPbqq0.jpeg" width="550"/>

An initial foray into the world of fine-tuning. The goal of this release was to amplify the quality of the original model's responses, in particular for vision use cases*

<b>GGUFs available [here](https://huggingface.co/InferenceIllusionist/Excalibur-7b-DPO-GGUF)</b>

## Notes & Methodology
* [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
* This is a quick experiment to determine the impact of DPO finetuning on the original base model
* Ran for a little over an hour on a single A100
* Internal benchmarks showed improvement over base model, awaiting final results
* Precision: bfloat16


## Sample Question - Vision
<img src="https://i.imgur.com/7aRWtzU.jpeg" width="425"/>

*<b>Requires additional mmproj file. You have two options for vision functionality (available inside this repo):</b>
 * [Quantized - Limited VRAM Option (197mb)](https://huggingface.co/InferenceIllusionist/Excalibur-7b-DPO-GGUF/resolve/main/mistral-7b-mmproj-v1.5-Q4_1.gguf?download=true)
 * [Unquantized - Premium Option / Best Quality (596mb)](https://huggingface.co/InferenceIllusionist/Excalibur-7b-DPO-GGUF/resolve/main/mmproj-model-f16.gguf?download=true)

Select the gguf file of your choice in [Koboldcpp](https://github.com/LostRuins/koboldcpp/releases/) as usual, then make sure to choose the mmproj file above in the LLaVA mmproj field of the model submenu:
<img src="https://i.imgur.com/x8vqH29.png" width="425"/>

## Prompt Format
* For best results please use ChatML for the prompt format. Alpaca may also work.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_InferenceIllusionist__Excalibur-7b-DPO)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |73.84|
|AI2 Reasoning Challenge (25-Shot)|70.90|
|HellaSwag (10-Shot)              |87.93|
|MMLU (5-Shot)                    |65.46|
|TruthfulQA (0-shot)              |70.82|
|Winogrande (5-shot)              |82.48|
|GSM8k (5-shot)                   |65.43|