File size: 16,629 Bytes
69f92b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
INFO: 2024-11-15 16:53:45,193: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] INFO: 2024-11-15 16:53:45,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 16:53:45,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 16:53:45,368: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] INFO: 2024-11-15 16:53:45,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 16:53:45,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 16:53:47,651: llmtf.base.darumeru/PARus: Loading Dataset: 2.28s INFO: 2024-11-15 16:53:48,903: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.71s INFO: 2024-11-15 16:53:49,849: llmtf.base.darumeru/PARus: Processing Dataset: 2.20s INFO: 2024-11-15 16:53:49,849: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-11-15 16:53:49,859: llmtf.base.darumeru/PARus: {'acc': 0.21} INFO: 2024-11-15 16:53:49,859: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 16:53:49,859: llmtf.base.evaluator: mean darumeru/PARus 0.210 0.210 INFO: 2024-11-15 16:53:58,225: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] INFO: 2024-11-15 16:53:58,225: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 16:53:58,225: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 16:54:01,621: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.40s INFO: 2024-11-15 16:54:27,891: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 26.27s INFO: 2024-11-15 16:54:27,892: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-11-15 16:54:27,903: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.46134020618556704, 'f1_macro': 0.4607981854644543} INFO: 2024-11-15 16:54:27,909: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 16:54:27,910: llmtf.base.evaluator: mean darumeru/PARus darumeru/ruOpenBookQA 0.336 0.210 0.461 INFO: 2024-11-15 16:54:36,808: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] INFO: 2024-11-15 16:54:36,809: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 16:54:36,809: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 16:54:41,755: llmtf.base.darumeru/RWSD: Loading Dataset: 4.95s INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: Processing Dataset: 2.49s INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: {'acc': 0.45098039215686275} INFO: 2024-11-15 16:54:44,249: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 16:54:44,250: llmtf.base.evaluator: mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA 0.374 0.210 0.451 0.461 INFO: 2024-11-15 16:54:52,912: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-11-15 16:54:52,912: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 16:54:52,912: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 16:56:43,744: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 110.83s INFO: 2024-11-15 16:59:38,517: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 174.77s INFO: 2024-11-15 16:59:38,517: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-11-15 16:59:38,578: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.240000 anatomy 0.370370 astronomy 0.453947 business_ethics 0.350000 clinical_knowledge 0.475472 college_biology 0.340278 college_chemistry 0.250000 college_computer_science 0.350000 college_mathematics 0.310000 college_medicine 0.427746 college_physics 0.274510 computer_security 0.540000 conceptual_physics 0.331915 econometrics 0.350877 electrical_engineering 0.496552 elementary_mathematics 0.357143 formal_logic 0.317460 global_facts 0.350000 high_school_biology 0.416129 high_school_chemistry 0.438424 high_school_computer_science 0.550000 high_school_european_history 0.472727 high_school_geography 0.484848 high_school_government_and_politics 0.398964 high_school_macroeconomics 0.325641 high_school_mathematics 0.318519 high_school_microeconomics 0.407563 high_school_physics 0.337748 high_school_psychology 0.453211 high_school_statistics 0.342593 high_school_us_history 0.387255 high_school_world_history 0.468354 human_aging 0.448430 human_sexuality 0.396947 international_law 0.661157 jurisprudence 0.490741 logical_fallacies 0.312883 machine_learning 0.339286 management 0.446602 marketing 0.615385 medical_genetics 0.390000 miscellaneous 0.441890 moral_disputes 0.419075 moral_scenarios 0.242458 nutrition 0.411765 philosophy 0.456592 prehistory 0.373457 professional_accounting 0.336879 professional_law 0.314211 professional_medicine 0.330882 professional_psychology 0.364379 public_relations 0.463636 security_studies 0.395918 sociology 0.452736 us_foreign_policy 0.640000 virology 0.361446 world_religions 0.403509 INFO: 2024-11-15 16:59:38,586: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.371502 humanities 0.409222 other (business, health, misc.) 0.411205 social sciences 0.427893 INFO: 2024-11-15 16:59:38,591: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.4049555256540374} INFO: 2024-11-15 16:59:38,625: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 16:59:38,627: llmtf.base.evaluator: mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU 0.382 0.210 0.451 0.461 0.405 INFO: 2024-11-15 16:59:47,415: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-11-15 16:59:47,416: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 16:59:47,416: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 16:59:50,756: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.34s INFO: 2024-11-15 17:00:04,037: llmtf.base.darumeru/MultiQ: Processing Dataset: 375.13s INFO: 2024-11-15 17:00:04,037: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-11-15 17:00:04,038: llmtf.base.darumeru/MultiQ: {'f1': 0.2166827508040571, 'em': 0.11567877629063097} INFO: 2024-11-15 17:00:04,043: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:00:04,044: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU 0.339 0.166 0.210 0.451 0.461 0.405 INFO: 2024-11-15 17:00:12,617: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] INFO: 2024-11-15 17:00:12,617: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:00:12,617: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:00:15,035: llmtf.base.darumeru/RCB: Loading Dataset: 2.42s INFO: 2024-11-15 17:00:17,696: llmtf.base.darumeru/RCB: Processing Dataset: 2.66s INFO: 2024-11-15 17:00:17,696: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-11-15 17:00:17,699: llmtf.base.darumeru/RCB: {'acc': 0.4681818181818182, 'f1_macro': 0.3981025874347421} INFO: 2024-11-15 17:00:17,700: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:00:17,701: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU 0.354 0.166 0.210 0.433 0.451 0.461 0.405 INFO: 2024-11-15 17:00:26,448: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] INFO: 2024-11-15 17:00:26,448: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:00:26,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:00:28,717: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.27s INFO: 2024-11-15 17:00:30,112: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 1.39s INFO: 2024-11-15 17:00:30,112: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-11-15 17:00:30,116: llmtf.base.darumeru/ruWorldTree: {'acc': 0.5619047619047619, 'f1_macro': 0.5621556341232892} INFO: 2024-11-15 17:00:30,116: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:00:30,116: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.384 0.166 0.210 0.433 0.451 0.461 0.562 0.405 INFO: 2024-11-15 17:00:38,826: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-11-15 17:00:38,826: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:00:38,826: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:00:51,532: llmtf.base.daru/treewayextractive: Loading Dataset: 12.71s INFO: 2024-11-15 17:02:39,091: llmtf.base.daru/treewayextractive: Processing Dataset: 107.56s INFO: 2024-11-15 17:02:39,091: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-11-15 17:02:39,318: llmtf.base.daru/treewayextractive: {'r-prec': 0.3703670274170274} INFO: 2024-11-15 17:02:39,357: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:02:39,359: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.382 0.370 0.166 0.210 0.433 0.451 0.461 0.562 0.405 INFO: 2024-11-15 17:02:47,961: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-11-15 17:02:47,961: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:02:47,961: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:03:40,869: llmtf.base.daru/treewayabstractive: Processing Dataset: 230.11s INFO: 2024-11-15 17:03:40,869: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-11-15 17:03:40,870: llmtf.base.daru/treewayabstractive: {'rouge1': 0.30478241492489133, 'rouge2': 0.09260347651798656} INFO: 2024-11-15 17:03:40,872: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:03:40,872: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.362 0.199 0.370 0.166 0.210 0.433 0.451 0.461 0.562 0.405 INFO: 2024-11-15 17:04:33,322: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 105.36s INFO: 2024-11-15 17:07:11,362: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 158.04s INFO: 2024-11-15 17:07:11,362: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-11-15 17:07:11,425: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.320000 anatomy 0.481481 astronomy 0.480263 business_ethics 0.470000 clinical_knowledge 0.573585 college_biology 0.506944 college_chemistry 0.310000 college_computer_science 0.380000 college_mathematics 0.290000 college_medicine 0.485549 college_physics 0.303922 computer_security 0.640000 conceptual_physics 0.489362 econometrics 0.412281 electrical_engineering 0.551724 elementary_mathematics 0.402116 formal_logic 0.285714 global_facts 0.350000 high_school_biology 0.541935 high_school_chemistry 0.453202 high_school_computer_science 0.670000 high_school_european_history 0.587879 high_school_geography 0.621212 high_school_government_and_politics 0.616580 high_school_macroeconomics 0.443590 high_school_mathematics 0.348148 high_school_microeconomics 0.529412 high_school_physics 0.344371 high_school_psychology 0.642202 high_school_statistics 0.430556 high_school_us_history 0.573529 high_school_world_history 0.594937 human_aging 0.542601 human_sexuality 0.534351 international_law 0.619835 jurisprudence 0.555556 logical_fallacies 0.558282 machine_learning 0.366071 management 0.582524 marketing 0.782051 medical_genetics 0.430000 miscellaneous 0.579821 moral_disputes 0.546243 moral_scenarios 0.241341 nutrition 0.562092 philosophy 0.517685 prehistory 0.518519 professional_accounting 0.414894 professional_law 0.365711 professional_medicine 0.367647 professional_psychology 0.446078 public_relations 0.536364 security_studies 0.538776 sociology 0.592040 us_foreign_policy 0.700000 virology 0.433735 world_religions 0.631579 INFO: 2024-11-15 17:07:11,433: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.434923 humanities 0.507447 other (business, health, misc.) 0.503999 social sciences 0.551074 INFO: 2024-11-15 17:07:11,438: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.4993605425093448} INFO: 2024-11-15 17:07:11,470: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:07:11,472: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.376 0.199 0.370 0.166 0.210 0.433 0.451 0.461 0.562 0.499 0.405 INFO: 2024-11-15 17:07:20,297: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-11-15 17:07:20,298: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:07:20,298: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:07:22,739: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.44s INFO: 2024-11-15 17:10:45,613: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 202.87s INFO: 2024-11-15 17:10:45,613: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-11-15 17:10:45,614: llmtf.base.darumeru/cp_para_ru: {'tokens_per_word': 1.8729616000663136, 'symbol_per_token': 4.043905520913176, 'len': 0.927490542336371, 'lcs': 0.05} INFO: 2024-11-15 17:10:45,615: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:10:45,615: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.346 0.199 0.370 0.166 0.210 0.433 0.451 0.050 0.461 0.562 0.499 0.405 |