cstr commited on
Commit
a371b16
1 Parent(s): 22d19b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -13,6 +13,87 @@ base_model:
13
  Spaetzle-v31-7b is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
14
  * [yleo/EmertonMonarch-7B](https://huggingface.co/yleo/EmertonMonarch-7B)
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## 🧩 Configuration
17
 
18
  ```yaml
 
13
  Spaetzle-v31-7b is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
14
  * [yleo/EmertonMonarch-7B](https://huggingface.co/yleo/EmertonMonarch-7B)
15
 
16
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
17
+ |--------------------------------------------------------------|------:|------:|---------:|-------:|------:|
18
+ |[Spaetzle-v31-7b](https://huggingface.co/cstr/Spaetzle-v31-7b)| 46.23| 76.6| 69.58| 46.79| 59.8|
19
+
20
+ ### AGIEval
21
+ | Task |Version| Metric |Value| |Stderr|
22
+ |------------------------------|------:|--------|----:|---|-----:|
23
+ |agieval_aqua_rat | 0|acc |28.74|± | 2.85|
24
+ | | |acc_norm|27.56|± | 2.81|
25
+ |agieval_logiqa_en | 0|acc |39.63|± | 1.92|
26
+ | | |acc_norm|40.25|± | 1.92|
27
+ |agieval_lsat_ar | 0|acc |24.35|± | 2.84|
28
+ | | |acc_norm|24.35|± | 2.84|
29
+ |agieval_lsat_lr | 0|acc |54.31|± | 2.21|
30
+ | | |acc_norm|54.12|± | 2.21|
31
+ |agieval_lsat_rc | 0|acc |65.80|± | 2.90|
32
+ | | |acc_norm|66.54|± | 2.88|
33
+ |agieval_sat_en | 0|acc |79.13|± | 2.84|
34
+ | | |acc_norm|79.61|± | 2.81|
35
+ |agieval_sat_en_without_passage| 0|acc |46.12|± | 3.48|
36
+ | | |acc_norm|45.15|± | 3.48|
37
+ |agieval_sat_math | 0|acc |35.00|± | 3.22|
38
+ | | |acc_norm|32.27|± | 3.16|
39
+
40
+ Average: 46.23%
41
+
42
+ ### GPT4All
43
+ | Task |Version| Metric |Value| |Stderr|
44
+ |-------------|------:|--------|----:|---|-----:|
45
+ |arc_challenge| 0|acc |64.76|± | 1.40|
46
+ | | |acc_norm|66.89|± | 1.38|
47
+ |arc_easy | 0|acc |86.66|± | 0.70|
48
+ | | |acc_norm|82.83|± | 0.77|
49
+ |boolq | 1|acc |87.80|± | 0.57|
50
+ |hellaswag | 0|acc |67.43|± | 0.47|
51
+ | | |acc_norm|85.85|± | 0.35|
52
+ |openbookqa | 0|acc |38.00|± | 2.17|
53
+ | | |acc_norm|48.80|± | 2.24|
54
+ |piqa | 0|acc |83.57|± | 0.86|
55
+ | | |acc_norm|84.71|± | 0.84|
56
+ |winogrande | 0|acc |79.32|± | 1.14|
57
+
58
+ Average: 76.6%
59
+
60
+ ### TruthfulQA
61
+ | Task |Version|Metric|Value| |Stderr|
62
+ |-------------|------:|------|----:|---|-----:|
63
+ |truthfulqa_mc| 1|mc1 |53.37|± | 1.75|
64
+ | | |mc2 |69.58|± | 1.48|
65
+
66
+ Average: 69.58%
67
+
68
+ ### Bigbench
69
+ | Task |Version| Metric |Value| |Stderr|
70
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
71
+ |bigbench_causal_judgement | 0|multiple_choice_grade|56.84|± | 3.60|
72
+ |bigbench_date_understanding | 0|multiple_choice_grade|66.94|± | 2.45|
73
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|44.57|± | 3.10|
74
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|21.17|± | 2.16|
75
+ | | |exact_str_match | 0.28|± | 0.28|
76
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|31.80|± | 2.08|
77
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|22.57|± | 1.58|
78
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|56.00|± | 2.87|
79
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|45.40|± | 2.23|
80
+ |bigbench_navigate | 0|multiple_choice_grade|52.80|± | 1.58|
81
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|70.65|± | 1.02|
82
+ |bigbench_ruin_names | 0|multiple_choice_grade|50.67|± | 2.36|
83
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|30.66|± | 1.46|
84
+ |bigbench_snarks | 0|multiple_choice_grade|71.27|± | 3.37|
85
+ |bigbench_sports_understanding | 0|multiple_choice_grade|74.34|± | 1.39|
86
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|49.80|± | 1.58|
87
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.16|± | 1.18|
88
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.57|± | 0.93|
89
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|56.00|± | 2.87|
90
+
91
+ Average: 46.79%
92
+
93
+ Average score: 59.8%
94
+
95
+ Elapsed time: 02:09:50
96
+
97
  ## 🧩 Configuration
98
 
99
  ```yaml