leaderboard-pt-pr-bot commited on
Commit
1f3060e
•
1 Parent(s): 9ebc7c2

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +168 -2
README.md CHANGED
@@ -1,9 +1,156 @@
1
  ---
2
- base_model: []
3
  library_name: transformers
4
  tags:
5
  - mergekit
6
  - merge
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6550b16f7490049d6237f200/p0ZBFYc1RNoYcowv3Nj40.jpeg)
@@ -95,4 +242,23 @@ dtype: bfloat16
95
  # Ko-fi
96
  ## Enjoying what I do? Consider donating here, thank you!
97
 
98
- https://ko-fi.com/spicy_marinara
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  library_name: transformers
3
  tags:
4
  - mergekit
5
  - merge
6
+ base_model: []
7
+ model-index:
8
+ - name: NemoReRemix-12B
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: ENEM Challenge (No Images)
15
+ type: eduagarcia/enem_challenge
16
+ split: train
17
+ args:
18
+ num_few_shot: 3
19
+ metrics:
20
+ - type: acc
21
+ value: 73.55
22
+ name: accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
25
+ name: Open Portuguese LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BLUEX (No Images)
31
+ type: eduagarcia-temp/BLUEX_without_images
32
+ split: train
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc
37
+ value: 62.17
38
+ name: accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
41
+ name: Open Portuguese LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: OAB Exams
47
+ type: eduagarcia/oab_exams
48
+ split: train
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc
53
+ value: 56.17
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
57
+ name: Open Portuguese LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: Assin2 RTE
63
+ type: assin2
64
+ split: test
65
+ args:
66
+ num_few_shot: 15
67
+ metrics:
68
+ - type: f1_macro
69
+ value: 91.6
70
+ name: f1-macro
71
+ source:
72
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
73
+ name: Open Portuguese LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: Assin2 STS
79
+ type: eduagarcia/portuguese_benchmark
80
+ split: test
81
+ args:
82
+ num_few_shot: 15
83
+ metrics:
84
+ - type: pearson
85
+ value: 80.23
86
+ name: pearson
87
+ source:
88
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
89
+ name: Open Portuguese LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: FaQuAD NLI
95
+ type: ruanchaves/faquad-nli
96
+ split: test
97
+ args:
98
+ num_few_shot: 15
99
+ metrics:
100
+ - type: f1_macro
101
+ value: 78.26
102
+ name: f1-macro
103
+ source:
104
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
105
+ name: Open Portuguese LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: HateBR Binary
111
+ type: ruanchaves/hatebr
112
+ split: test
113
+ args:
114
+ num_few_shot: 25
115
+ metrics:
116
+ - type: f1_macro
117
+ value: 81.95
118
+ name: f1-macro
119
+ source:
120
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
121
+ name: Open Portuguese LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: PT Hate Speech Binary
127
+ type: hate_speech_portuguese
128
+ split: test
129
+ args:
130
+ num_few_shot: 25
131
+ metrics:
132
+ - type: f1_macro
133
+ value: 63.75
134
+ name: f1-macro
135
+ source:
136
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
137
+ name: Open Portuguese LLM Leaderboard
138
+ - task:
139
+ type: text-generation
140
+ name: Text Generation
141
+ dataset:
142
+ name: tweetSentBR
143
+ type: eduagarcia/tweetsentbr_fewshot
144
+ split: test
145
+ args:
146
+ num_few_shot: 25
147
+ metrics:
148
+ - type: f1_macro
149
+ value: 72.51
150
+ name: f1-macro
151
+ source:
152
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=MarinaraSpaghetti/NemoReRemix-12B
153
+ name: Open Portuguese LLM Leaderboard
154
  ---
155
 
156
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6550b16f7490049d6237f200/p0ZBFYc1RNoYcowv3Nj40.jpeg)
 
242
  # Ko-fi
243
  ## Enjoying what I do? Consider donating here, thank you!
244
 
245
+ https://ko-fi.com/spicy_marinara
246
+
247
+
248
+ # Open Portuguese LLM Leaderboard Evaluation Results
249
+
250
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/MarinaraSpaghetti/NemoReRemix-12B) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
251
+
252
+ | Metric | Value |
253
+ |--------------------------|---------|
254
+ |Average |**73.36**|
255
+ |ENEM Challenge (No Images)| 73.55|
256
+ |BLUEX (No Images) | 62.17|
257
+ |OAB Exams | 56.17|
258
+ |Assin2 RTE | 91.60|
259
+ |Assin2 STS | 80.23|
260
+ |FaQuAD NLI | 78.26|
261
+ |HateBR Binary | 81.95|
262
+ |PT Hate Speech Binary | 63.75|
263
+ |tweetSentBR | 72.51|
264
+