leaderboard-pr-bot commited on
Commit
eaed85d
1 Parent(s): fc951b0

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +110 -1
README.md CHANGED
@@ -1,6 +1,101 @@
1
  ---
2
- library_name: transformers
3
  license: llama3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
  # Llama-3-70B-Instruct-abliterated-v3.5 Model Card
6
 
@@ -58,3 +153,17 @@ This model may come with interesting quirks, with the methodology being so new.
58
  If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that I believe are as-yet unexplored.
59
 
60
  Additionally, feel free to reach out in any way about this. I'm on the Cognitive Computations Discord, I'm watching the Community tab, reach out! I'd love to see this methodology used in other ways, and so would gladly support whoever whenever I can.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  license: llama3
3
+ library_name: transformers
4
+ model-index:
5
+ - name: Meta-Llama-3-70B-Instruct-abliterated-v3.5
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: IFEval (0-Shot)
12
+ type: HuggingFaceH4/ifeval
13
+ args:
14
+ num_few_shot: 0
15
+ metrics:
16
+ - type: inst_level_strict_acc and prompt_level_strict_acc
17
+ value: 77.47
18
+ name: strict accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
21
+ name: Open LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BBH (3-Shot)
27
+ type: BBH
28
+ args:
29
+ num_few_shot: 3
30
+ metrics:
31
+ - type: acc_norm
32
+ value: 37.87
33
+ name: normalized accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: MATH Lvl 5 (4-Shot)
42
+ type: hendrycks/competition_math
43
+ args:
44
+ num_few_shot: 4
45
+ metrics:
46
+ - type: exact_match
47
+ value: 11.86
48
+ name: exact match
49
+ source:
50
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
51
+ name: Open LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: GPQA (0-shot)
57
+ type: Idavidrein/gpqa
58
+ args:
59
+ num_few_shot: 0
60
+ metrics:
61
+ - type: acc_norm
62
+ value: 6.26
63
+ name: acc_norm
64
+ source:
65
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
66
+ name: Open LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: MuSR (0-shot)
72
+ type: TAUR-Lab/MuSR
73
+ args:
74
+ num_few_shot: 0
75
+ metrics:
76
+ - type: acc_norm
77
+ value: 7.97
78
+ name: acc_norm
79
+ source:
80
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: MMLU-PRO (5-shot)
87
+ type: TIGER-Lab/MMLU-Pro
88
+ config: main
89
+ split: test
90
+ args:
91
+ num_few_shot: 5
92
+ metrics:
93
+ - type: acc
94
+ value: 38.36
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
98
+ name: Open LLM Leaderboard
99
  ---
100
  # Llama-3-70B-Instruct-abliterated-v3.5 Model Card
101
 
 
153
  If you manage to develop further improvements, please share! This is really the most basic way to use ablation, but there are other possibilities that I believe are as-yet unexplored.
154
 
155
  Additionally, feel free to reach out in any way about this. I'm on the Cognitive Computations Discord, I'm watching the Community tab, reach out! I'd love to see this methodology used in other ways, and so would gladly support whoever whenever I can.
156
+
157
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
158
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_failspy__Meta-Llama-3-70B-Instruct-abliterated-v3.5)
159
+
160
+ | Metric |Value|
161
+ |-------------------|----:|
162
+ |Avg. |29.97|
163
+ |IFEval (0-Shot) |77.47|
164
+ |BBH (3-Shot) |37.87|
165
+ |MATH Lvl 5 (4-Shot)|11.86|
166
+ |GPQA (0-shot) | 6.26|
167
+ |MuSR (0-shot) | 7.97|
168
+ |MMLU-PRO (5-shot) |38.36|
169
+