MaziyarPanahi commited on
Commit
5ac1cdf
1 Parent(s): 5a44e1d

Adding Evaluation Results (#5)

Browse files

- Adding Evaluation Results (f35efbc677955b755610a2f48fdb13ec1f2dd454)

Files changed (1) hide show
  1. README.md +124 -8
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
- base_model: meta-llama/Meta-Llama-3-70B-Instruct
 
 
3
  library_name: transformers
4
  tags:
5
  - axolotl
@@ -11,18 +13,119 @@ tags:
11
  - llama
12
  - llama-3
13
  - chatml
14
- language:
15
- - en
 
 
16
  pipeline_tag: text-generation
17
- license: llama3
18
  license_name: llama3
19
  license_link: LICENSE
20
  inference: false
21
  model_creator: MaziyarPanahi
22
- model_name: Llama-3-70B-Instruct-DPO-v0.4
23
  quantized_by: MaziyarPanahi
24
- datasets:
25
- - argilla/ultrafeedback-binarized-preferences
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ---
27
 
28
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -156,4 +259,17 @@ Here are the pros and cons of the Docker system:
156
  10. **Vendor Lock-in**: Docker is a proprietary technology, and while it has a large ecosystem, it can be difficult to switch to alternative containerization platforms.
157
 
158
  Overall, Docker provides a powerful and flexible way to deploy and manage applications, but it requires careful planning, configuration, and management to ensure optimal performance and security.
159
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: llama3
5
  library_name: transformers
6
  tags:
7
  - axolotl
 
13
  - llama
14
  - llama-3
15
  - chatml
16
+ base_model: meta-llama/Meta-Llama-3-70B-Instruct
17
+ datasets:
18
+ - argilla/ultrafeedback-binarized-preferences
19
+ model_name: Llama-3-70B-Instruct-DPO-v0.4
20
  pipeline_tag: text-generation
 
21
  license_name: llama3
22
  license_link: LICENSE
23
  inference: false
24
  model_creator: MaziyarPanahi
 
25
  quantized_by: MaziyarPanahi
26
+ model-index:
27
+ - name: Llama-3-70B-Instruct-DPO-v0.4
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: AI2 Reasoning Challenge (25-Shot)
34
+ type: ai2_arc
35
+ config: ARC-Challenge
36
+ split: test
37
+ args:
38
+ num_few_shot: 25
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 72.61
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: HellaSwag (10-Shot)
51
+ type: hellaswag
52
+ split: validation
53
+ args:
54
+ num_few_shot: 10
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 86.03
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MMLU (5-Shot)
67
+ type: cais/mmlu
68
+ config: all
69
+ split: test
70
+ args:
71
+ num_few_shot: 5
72
+ metrics:
73
+ - type: acc
74
+ value: 80.5
75
+ name: accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: TruthfulQA (0-shot)
84
+ type: truthful_qa
85
+ config: multiple_choice
86
+ split: validation
87
+ args:
88
+ num_few_shot: 0
89
+ metrics:
90
+ - type: mc2
91
+ value: 63.26
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: Winogrande (5-shot)
100
+ type: winogrande
101
+ config: winogrande_xl
102
+ split: validation
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 83.58
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: GSM8k (5-shot)
117
+ type: gsm8k
118
+ config: main
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ value: 87.34
125
+ name: accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4
128
+ name: Open LLM Leaderboard
129
  ---
130
 
131
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
259
  10. **Vendor Lock-in**: Docker is a proprietary technology, and while it has a large ecosystem, it can be difficult to switch to alternative containerization platforms.
260
 
261
  Overall, Docker provides a powerful and flexible way to deploy and manage applications, but it requires careful planning, configuration, and management to ensure optimal performance and security.
262
+ ```
263
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
264
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Llama-3-70B-Instruct-DPO-v0.4)
265
+
266
+ | Metric |Value|
267
+ |---------------------------------|----:|
268
+ |Avg. |78.89|
269
+ |AI2 Reasoning Challenge (25-Shot)|72.61|
270
+ |HellaSwag (10-Shot) |86.03|
271
+ |MMLU (5-Shot) |80.50|
272
+ |TruthfulQA (0-shot) |63.26|
273
+ |Winogrande (5-shot) |83.58|
274
+ |GSM8k (5-shot) |87.34|
275
+