File size: 16,629 Bytes
69f92b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
INFO: 2024-11-15 16:53:45,193: llmtf.base.evaluator: Starting eval on ['darumeru/multiq']
INFO: 2024-11-15 16:53:45,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 16:53:45,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 16:53:45,368: llmtf.base.evaluator: Starting eval on ['darumeru/parus']
INFO: 2024-11-15 16:53:45,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 16:53:45,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 16:53:47,651: llmtf.base.darumeru/PARus: Loading Dataset: 2.28s
INFO: 2024-11-15 16:53:48,903: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.71s
INFO: 2024-11-15 16:53:49,849: llmtf.base.darumeru/PARus: Processing Dataset: 2.20s
INFO: 2024-11-15 16:53:49,849: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-11-15 16:53:49,859: llmtf.base.darumeru/PARus: {'acc': 0.21}
INFO: 2024-11-15 16:53:49,859: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 16:53:49,859: llmtf.base.evaluator: 
mean	darumeru/PARus
0.210	0.210
INFO: 2024-11-15 16:53:58,225: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa']
INFO: 2024-11-15 16:53:58,225: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 16:53:58,225: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 16:54:01,621: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.40s
INFO: 2024-11-15 16:54:27,891: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 26.27s
INFO: 2024-11-15 16:54:27,892: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-11-15 16:54:27,903: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.46134020618556704, 'f1_macro': 0.4607981854644543}
INFO: 2024-11-15 16:54:27,909: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 16:54:27,910: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/ruOpenBookQA
0.336	0.210	0.461
INFO: 2024-11-15 16:54:36,808: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd']
INFO: 2024-11-15 16:54:36,809: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 16:54:36,809: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 16:54:41,755: llmtf.base.darumeru/RWSD: Loading Dataset: 4.95s
INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: Processing Dataset: 2.49s
INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: {'acc': 0.45098039215686275}
INFO: 2024-11-15 16:54:44,249: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 16:54:44,250: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/RWSD	darumeru/ruOpenBookQA
0.374	0.210	0.451	0.461
INFO: 2024-11-15 16:54:52,912: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-11-15 16:54:52,912: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 16:54:52,912: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 16:56:43,744: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 110.83s
INFO: 2024-11-15 16:59:38,517: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 174.77s
INFO: 2024-11-15 16:59:38,517: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-11-15 16:59:38,578: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.240000
anatomy                              0.370370
astronomy                            0.453947
business_ethics                      0.350000
clinical_knowledge                   0.475472
college_biology                      0.340278
college_chemistry                    0.250000
college_computer_science             0.350000
college_mathematics                  0.310000
college_medicine                     0.427746
college_physics                      0.274510
computer_security                    0.540000
conceptual_physics                   0.331915
econometrics                         0.350877
electrical_engineering               0.496552
elementary_mathematics               0.357143
formal_logic                         0.317460
global_facts                         0.350000
high_school_biology                  0.416129
high_school_chemistry                0.438424
high_school_computer_science         0.550000
high_school_european_history         0.472727
high_school_geography                0.484848
high_school_government_and_politics  0.398964
high_school_macroeconomics           0.325641
high_school_mathematics              0.318519
high_school_microeconomics           0.407563
high_school_physics                  0.337748
high_school_psychology               0.453211
high_school_statistics               0.342593
high_school_us_history               0.387255
high_school_world_history            0.468354
human_aging                          0.448430
human_sexuality                      0.396947
international_law                    0.661157
jurisprudence                        0.490741
logical_fallacies                    0.312883
machine_learning                     0.339286
management                           0.446602
marketing                            0.615385
medical_genetics                     0.390000
miscellaneous                        0.441890
moral_disputes                       0.419075
moral_scenarios                      0.242458
nutrition                            0.411765
philosophy                           0.456592
prehistory                           0.373457
professional_accounting              0.336879
professional_law                     0.314211
professional_medicine                0.330882
professional_psychology              0.364379
public_relations                     0.463636
security_studies                     0.395918
sociology                            0.452736
us_foreign_policy                    0.640000
virology                             0.361446
world_religions                      0.403509
INFO: 2024-11-15 16:59:38,586: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.371502
humanities                       0.409222
other (business, health, misc.)  0.411205
social sciences                  0.427893
INFO: 2024-11-15 16:59:38,591: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.4049555256540374}
INFO: 2024-11-15 16:59:38,625: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 16:59:38,627: llmtf.base.evaluator: 
mean	darumeru/PARus	darumeru/RWSD	darumeru/ruOpenBookQA	nlpcoreteam/ruMMLU
0.382	0.210	0.451	0.461	0.405
INFO: 2024-11-15 16:59:47,415: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-11-15 16:59:47,416: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 16:59:47,416: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 16:59:50,756: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.34s
INFO: 2024-11-15 17:00:04,037: llmtf.base.darumeru/MultiQ: Processing Dataset: 375.13s
INFO: 2024-11-15 17:00:04,037: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-11-15 17:00:04,038: llmtf.base.darumeru/MultiQ: {'f1': 0.2166827508040571, 'em': 0.11567877629063097}
INFO: 2024-11-15 17:00:04,043: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 17:00:04,044: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RWSD	darumeru/ruOpenBookQA	nlpcoreteam/ruMMLU
0.339	0.166	0.210	0.451	0.461	0.405
INFO: 2024-11-15 17:00:12,617: llmtf.base.evaluator: Starting eval on ['darumeru/rcb']
INFO: 2024-11-15 17:00:12,617: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 17:00:12,617: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 17:00:15,035: llmtf.base.darumeru/RCB: Loading Dataset: 2.42s
INFO: 2024-11-15 17:00:17,696: llmtf.base.darumeru/RCB: Processing Dataset: 2.66s
INFO: 2024-11-15 17:00:17,696: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-11-15 17:00:17,699: llmtf.base.darumeru/RCB: {'acc': 0.4681818181818182, 'f1_macro': 0.3981025874347421}
INFO: 2024-11-15 17:00:17,700: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 17:00:17,701: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	nlpcoreteam/ruMMLU
0.354	0.166	0.210	0.433	0.451	0.461	0.405
INFO: 2024-11-15 17:00:26,448: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree']
INFO: 2024-11-15 17:00:26,448: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 17:00:26,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 17:00:28,717: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.27s
INFO: 2024-11-15 17:00:30,112: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 1.39s
INFO: 2024-11-15 17:00:30,112: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-11-15 17:00:30,116: llmtf.base.darumeru/ruWorldTree: {'acc': 0.5619047619047619, 'f1_macro': 0.5621556341232892}
INFO: 2024-11-15 17:00:30,116: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 17:00:30,116: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/ruMMLU
0.384	0.166	0.210	0.433	0.451	0.461	0.562	0.405
INFO: 2024-11-15 17:00:38,826: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-11-15 17:00:38,826: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 17:00:38,826: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 17:00:51,532: llmtf.base.daru/treewayextractive: Loading Dataset: 12.71s
INFO: 2024-11-15 17:02:39,091: llmtf.base.daru/treewayextractive: Processing Dataset: 107.56s
INFO: 2024-11-15 17:02:39,091: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-11-15 17:02:39,318: llmtf.base.daru/treewayextractive: {'r-prec': 0.3703670274170274}
INFO: 2024-11-15 17:02:39,357: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 17:02:39,359: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/ruMMLU
0.382	0.370	0.166	0.210	0.433	0.451	0.461	0.562	0.405
INFO: 2024-11-15 17:02:47,961: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-11-15 17:02:47,961: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 17:02:47,961: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 17:03:40,869: llmtf.base.daru/treewayabstractive: Processing Dataset: 230.11s
INFO: 2024-11-15 17:03:40,869: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-11-15 17:03:40,870: llmtf.base.daru/treewayabstractive: {'rouge1': 0.30478241492489133, 'rouge2': 0.09260347651798656}
INFO: 2024-11-15 17:03:40,872: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 17:03:40,872: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/ruMMLU
0.362	0.199	0.370	0.166	0.210	0.433	0.451	0.461	0.562	0.405
INFO: 2024-11-15 17:04:33,322: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 105.36s
INFO: 2024-11-15 17:07:11,362: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 158.04s
INFO: 2024-11-15 17:07:11,362: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-11-15 17:07:11,425: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.320000
anatomy                              0.481481
astronomy                            0.480263
business_ethics                      0.470000
clinical_knowledge                   0.573585
college_biology                      0.506944
college_chemistry                    0.310000
college_computer_science             0.380000
college_mathematics                  0.290000
college_medicine                     0.485549
college_physics                      0.303922
computer_security                    0.640000
conceptual_physics                   0.489362
econometrics                         0.412281
electrical_engineering               0.551724
elementary_mathematics               0.402116
formal_logic                         0.285714
global_facts                         0.350000
high_school_biology                  0.541935
high_school_chemistry                0.453202
high_school_computer_science         0.670000
high_school_european_history         0.587879
high_school_geography                0.621212
high_school_government_and_politics  0.616580
high_school_macroeconomics           0.443590
high_school_mathematics              0.348148
high_school_microeconomics           0.529412
high_school_physics                  0.344371
high_school_psychology               0.642202
high_school_statistics               0.430556
high_school_us_history               0.573529
high_school_world_history            0.594937
human_aging                          0.542601
human_sexuality                      0.534351
international_law                    0.619835
jurisprudence                        0.555556
logical_fallacies                    0.558282
machine_learning                     0.366071
management                           0.582524
marketing                            0.782051
medical_genetics                     0.430000
miscellaneous                        0.579821
moral_disputes                       0.546243
moral_scenarios                      0.241341
nutrition                            0.562092
philosophy                           0.517685
prehistory                           0.518519
professional_accounting              0.414894
professional_law                     0.365711
professional_medicine                0.367647
professional_psychology              0.446078
public_relations                     0.536364
security_studies                     0.538776
sociology                            0.592040
us_foreign_policy                    0.700000
virology                             0.433735
world_religions                      0.631579
INFO: 2024-11-15 17:07:11,433: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.434923
humanities                       0.507447
other (business, health, misc.)  0.503999
social sciences                  0.551074
INFO: 2024-11-15 17:07:11,438: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.4993605425093448}
INFO: 2024-11-15 17:07:11,470: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 17:07:11,472: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.376	0.199	0.370	0.166	0.210	0.433	0.451	0.461	0.562	0.499	0.405
INFO: 2024-11-15 17:07:20,297: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru']
INFO: 2024-11-15 17:07:20,298: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111]
INFO: 2024-11-15 17:07:20,298: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-11-15 17:07:22,739: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.44s
INFO: 2024-11-15 17:10:45,613: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 202.87s
INFO: 2024-11-15 17:10:45,613: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-11-15 17:10:45,614: llmtf.base.darumeru/cp_para_ru: {'tokens_per_word': 1.8729616000663136, 'symbol_per_token': 4.043905520913176, 'len': 0.927490542336371, 'lcs': 0.05}
INFO: 2024-11-15 17:10:45,615: llmtf.base.evaluator: Ended eval
INFO: 2024-11-15 17:10:45,615: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_ru	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.346	0.199	0.370	0.166	0.210	0.433	0.451	0.050	0.461	0.562	0.499	0.405