DPO fine tunes
Collection
3 items
β’
Updated
This model is a finetune of macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo
Available here
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
SOLAR-10.7b-Instruct-truthy-dpo | 48.69 | 73.82 | 76.81 | 45.71 | 61.26 |
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 27.95 | Β± | 2.82 |
acc_norm | 27.95 | Β± | 2.82 | ||
agieval_logiqa_en | 0 | acc | 42.40 | Β± | 1.94 |
acc_norm | 42.24 | Β± | 1.94 | ||
agieval_lsat_ar | 0 | acc | 25.65 | Β± | 2.89 |
acc_norm | 23.91 | Β± | 2.82 | ||
agieval_lsat_lr | 0 | acc | 54.12 | Β± | 2.21 |
acc_norm | 54.51 | Β± | 2.21 | ||
agieval_lsat_rc | 0 | acc | 69.89 | Β± | 2.80 |
acc_norm | 69.89 | Β± | 2.80 | ||
agieval_sat_en | 0 | acc | 80.10 | Β± | 2.79 |
acc_norm | 80.10 | Β± | 2.79 | ||
agieval_sat_en_without_passage | 0 | acc | 50.00 | Β± | 3.49 |
acc_norm | 49.51 | Β± | 3.49 | ||
agieval_sat_math | 0 | acc | 42.27 | Β± | 3.34 |
acc_norm | 41.36 | Β± | 3.33 |
Average: 48.69%
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 59.90 | Β± | 1.43 |
acc_norm | 63.91 | Β± | 1.40 | ||
arc_easy | 0 | acc | 80.85 | Β± | 0.81 |
acc_norm | 78.16 | Β± | 0.85 | ||
boolq | 1 | acc | 88.20 | Β± | 0.56 |
hellaswag | 0 | acc | 68.34 | Β± | 0.46 |
acc_norm | 86.39 | Β± | 0.34 | ||
openbookqa | 0 | acc | 37.60 | Β± | 2.17 |
acc_norm | 46.80 | Β± | 2.23 | ||
piqa | 0 | acc | 78.84 | Β± | 0.95 |
acc_norm | 78.78 | Β± | 0.95 | ||
winogrande | 0 | acc | 74.51 | Β± | 1.22 |
Average: 73.82%
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 61.81 | Β± | 1.70 |
mc2 | 76.81 | Β± | 1.42 |
Average: 76.81%
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
bigbench_causal_judgement | 0 | multiple_choice_grade | 50.53 | Β± | 3.64 |
bigbench_date_understanding | 0 | multiple_choice_grade | 63.14 | Β± | 2.51 |
bigbench_disambiguation_qa | 0 | multiple_choice_grade | 47.67 | Β± | 3.12 |
bigbench_geometric_shapes | 0 | multiple_choice_grade | 26.18 | Β± | 2.32 |
exact_str_match | 0.00 | Β± | 0.00 | ||
bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 28.60 | Β± | 2.02 |
bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 21.29 | Β± | 1.55 |
bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 47.33 | Β± | 2.89 |
bigbench_movie_recommendation | 0 | multiple_choice_grade | 39.80 | Β± | 2.19 |
bigbench_navigate | 0 | multiple_choice_grade | 63.80 | Β± | 1.52 |
bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 59.05 | Β± | 1.10 |
bigbench_ruin_names | 0 | multiple_choice_grade | 40.18 | Β± | 2.32 |
bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 46.69 | Β± | 1.58 |
bigbench_snarks | 0 | multiple_choice_grade | 65.19 | Β± | 3.55 |
bigbench_sports_understanding | 0 | multiple_choice_grade | 72.41 | Β± | 1.42 |
bigbench_temporal_sequences | 0 | multiple_choice_grade | 60.30 | Β± | 1.55 |
bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 25.76 | Β± | 1.24 |
bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 17.43 | Β± | 0.91 |
bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 47.33 | Β± | 2.89 |
Average: 45.71%
Average score: 61.26%
Elapsed time: 02:16:03
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 74.11 |
AI2 Reasoning Challenge (25-Shot) | 72.10 |
HellaSwag (10-Shot) | 88.44 |
MMLU (5-Shot) | 65.45 |
TruthfulQA (0-shot) | 76.75 |
Winogrande (5-shot) | 82.72 |
GSM8k (5-shot) | 59.21 |