|
INFO: 2024-11-15 16:53:45,193: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] |
|
INFO: 2024-11-15 16:53:45,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 16:53:45,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 16:53:45,368: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] |
|
INFO: 2024-11-15 16:53:45,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 16:53:45,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 16:53:47,651: llmtf.base.darumeru/PARus: Loading Dataset: 2.28s |
|
INFO: 2024-11-15 16:53:48,903: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.71s |
|
INFO: 2024-11-15 16:53:49,849: llmtf.base.darumeru/PARus: Processing Dataset: 2.20s |
|
INFO: 2024-11-15 16:53:49,849: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-11-15 16:53:49,859: llmtf.base.darumeru/PARus: {'acc': 0.21} |
|
INFO: 2024-11-15 16:53:49,859: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 16:53:49,859: llmtf.base.evaluator: |
|
mean darumeru/PARus |
|
0.210 0.210 |
|
INFO: 2024-11-15 16:53:58,225: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] |
|
INFO: 2024-11-15 16:53:58,225: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 16:53:58,225: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 16:54:01,621: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.40s |
|
INFO: 2024-11-15 16:54:27,891: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 26.27s |
|
INFO: 2024-11-15 16:54:27,892: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-11-15 16:54:27,903: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.46134020618556704, 'f1_macro': 0.4607981854644543} |
|
INFO: 2024-11-15 16:54:27,909: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 16:54:27,910: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/ruOpenBookQA |
|
0.336 0.210 0.461 |
|
INFO: 2024-11-15 16:54:36,808: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] |
|
INFO: 2024-11-15 16:54:36,809: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 16:54:36,809: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 16:54:41,755: llmtf.base.darumeru/RWSD: Loading Dataset: 4.95s |
|
INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: Processing Dataset: 2.49s |
|
INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-11-15 16:54:44,248: llmtf.base.darumeru/RWSD: {'acc': 0.45098039215686275} |
|
INFO: 2024-11-15 16:54:44,249: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 16:54:44,250: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA |
|
0.374 0.210 0.451 0.461 |
|
INFO: 2024-11-15 16:54:52,912: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-11-15 16:54:52,912: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 16:54:52,912: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 16:56:43,744: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 110.83s |
|
INFO: 2024-11-15 16:59:38,517: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 174.77s |
|
INFO: 2024-11-15 16:59:38,517: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-11-15 16:59:38,578: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.240000 |
|
anatomy 0.370370 |
|
astronomy 0.453947 |
|
business_ethics 0.350000 |
|
clinical_knowledge 0.475472 |
|
college_biology 0.340278 |
|
college_chemistry 0.250000 |
|
college_computer_science 0.350000 |
|
college_mathematics 0.310000 |
|
college_medicine 0.427746 |
|
college_physics 0.274510 |
|
computer_security 0.540000 |
|
conceptual_physics 0.331915 |
|
econometrics 0.350877 |
|
electrical_engineering 0.496552 |
|
elementary_mathematics 0.357143 |
|
formal_logic 0.317460 |
|
global_facts 0.350000 |
|
high_school_biology 0.416129 |
|
high_school_chemistry 0.438424 |
|
high_school_computer_science 0.550000 |
|
high_school_european_history 0.472727 |
|
high_school_geography 0.484848 |
|
high_school_government_and_politics 0.398964 |
|
high_school_macroeconomics 0.325641 |
|
high_school_mathematics 0.318519 |
|
high_school_microeconomics 0.407563 |
|
high_school_physics 0.337748 |
|
high_school_psychology 0.453211 |
|
high_school_statistics 0.342593 |
|
high_school_us_history 0.387255 |
|
high_school_world_history 0.468354 |
|
human_aging 0.448430 |
|
human_sexuality 0.396947 |
|
international_law 0.661157 |
|
jurisprudence 0.490741 |
|
logical_fallacies 0.312883 |
|
machine_learning 0.339286 |
|
management 0.446602 |
|
marketing 0.615385 |
|
medical_genetics 0.390000 |
|
miscellaneous 0.441890 |
|
moral_disputes 0.419075 |
|
moral_scenarios 0.242458 |
|
nutrition 0.411765 |
|
philosophy 0.456592 |
|
prehistory 0.373457 |
|
professional_accounting 0.336879 |
|
professional_law 0.314211 |
|
professional_medicine 0.330882 |
|
professional_psychology 0.364379 |
|
public_relations 0.463636 |
|
security_studies 0.395918 |
|
sociology 0.452736 |
|
us_foreign_policy 0.640000 |
|
virology 0.361446 |
|
world_religions 0.403509 |
|
INFO: 2024-11-15 16:59:38,586: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.371502 |
|
humanities 0.409222 |
|
other (business, health, misc.) 0.411205 |
|
social sciences 0.427893 |
|
INFO: 2024-11-15 16:59:38,591: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.4049555256540374} |
|
INFO: 2024-11-15 16:59:38,625: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 16:59:38,627: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU |
|
0.382 0.210 0.451 0.461 0.405 |
|
INFO: 2024-11-15 16:59:47,415: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-11-15 16:59:47,416: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 16:59:47,416: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 16:59:50,756: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.34s |
|
INFO: 2024-11-15 17:00:04,037: llmtf.base.darumeru/MultiQ: Processing Dataset: 375.13s |
|
INFO: 2024-11-15 17:00:04,037: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-11-15 17:00:04,038: llmtf.base.darumeru/MultiQ: {'f1': 0.2166827508040571, 'em': 0.11567877629063097} |
|
INFO: 2024-11-15 17:00:04,043: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 17:00:04,044: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU |
|
0.339 0.166 0.210 0.451 0.461 0.405 |
|
INFO: 2024-11-15 17:00:12,617: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] |
|
INFO: 2024-11-15 17:00:12,617: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 17:00:12,617: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 17:00:15,035: llmtf.base.darumeru/RCB: Loading Dataset: 2.42s |
|
INFO: 2024-11-15 17:00:17,696: llmtf.base.darumeru/RCB: Processing Dataset: 2.66s |
|
INFO: 2024-11-15 17:00:17,696: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-11-15 17:00:17,699: llmtf.base.darumeru/RCB: {'acc': 0.4681818181818182, 'f1_macro': 0.3981025874347421} |
|
INFO: 2024-11-15 17:00:17,700: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 17:00:17,701: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU |
|
0.354 0.166 0.210 0.433 0.451 0.461 0.405 |
|
INFO: 2024-11-15 17:00:26,448: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] |
|
INFO: 2024-11-15 17:00:26,448: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 17:00:26,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 17:00:28,717: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.27s |
|
INFO: 2024-11-15 17:00:30,112: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 1.39s |
|
INFO: 2024-11-15 17:00:30,112: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-11-15 17:00:30,116: llmtf.base.darumeru/ruWorldTree: {'acc': 0.5619047619047619, 'f1_macro': 0.5621556341232892} |
|
INFO: 2024-11-15 17:00:30,116: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 17:00:30,116: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.384 0.166 0.210 0.433 0.451 0.461 0.562 0.405 |
|
INFO: 2024-11-15 17:00:38,826: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-11-15 17:00:38,826: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 17:00:38,826: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 17:00:51,532: llmtf.base.daru/treewayextractive: Loading Dataset: 12.71s |
|
INFO: 2024-11-15 17:02:39,091: llmtf.base.daru/treewayextractive: Processing Dataset: 107.56s |
|
INFO: 2024-11-15 17:02:39,091: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-11-15 17:02:39,318: llmtf.base.daru/treewayextractive: {'r-prec': 0.3703670274170274} |
|
INFO: 2024-11-15 17:02:39,357: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 17:02:39,359: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.382 0.370 0.166 0.210 0.433 0.451 0.461 0.562 0.405 |
|
INFO: 2024-11-15 17:02:47,961: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-11-15 17:02:47,961: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 17:02:47,961: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 17:03:40,869: llmtf.base.daru/treewayabstractive: Processing Dataset: 230.11s |
|
INFO: 2024-11-15 17:03:40,869: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-11-15 17:03:40,870: llmtf.base.daru/treewayabstractive: {'rouge1': 0.30478241492489133, 'rouge2': 0.09260347651798656} |
|
INFO: 2024-11-15 17:03:40,872: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 17:03:40,872: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.362 0.199 0.370 0.166 0.210 0.433 0.451 0.461 0.562 0.405 |
|
INFO: 2024-11-15 17:04:33,322: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 105.36s |
|
INFO: 2024-11-15 17:07:11,362: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 158.04s |
|
INFO: 2024-11-15 17:07:11,362: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-11-15 17:07:11,425: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.320000 |
|
anatomy 0.481481 |
|
astronomy 0.480263 |
|
business_ethics 0.470000 |
|
clinical_knowledge 0.573585 |
|
college_biology 0.506944 |
|
college_chemistry 0.310000 |
|
college_computer_science 0.380000 |
|
college_mathematics 0.290000 |
|
college_medicine 0.485549 |
|
college_physics 0.303922 |
|
computer_security 0.640000 |
|
conceptual_physics 0.489362 |
|
econometrics 0.412281 |
|
electrical_engineering 0.551724 |
|
elementary_mathematics 0.402116 |
|
formal_logic 0.285714 |
|
global_facts 0.350000 |
|
high_school_biology 0.541935 |
|
high_school_chemistry 0.453202 |
|
high_school_computer_science 0.670000 |
|
high_school_european_history 0.587879 |
|
high_school_geography 0.621212 |
|
high_school_government_and_politics 0.616580 |
|
high_school_macroeconomics 0.443590 |
|
high_school_mathematics 0.348148 |
|
high_school_microeconomics 0.529412 |
|
high_school_physics 0.344371 |
|
high_school_psychology 0.642202 |
|
high_school_statistics 0.430556 |
|
high_school_us_history 0.573529 |
|
high_school_world_history 0.594937 |
|
human_aging 0.542601 |
|
human_sexuality 0.534351 |
|
international_law 0.619835 |
|
jurisprudence 0.555556 |
|
logical_fallacies 0.558282 |
|
machine_learning 0.366071 |
|
management 0.582524 |
|
marketing 0.782051 |
|
medical_genetics 0.430000 |
|
miscellaneous 0.579821 |
|
moral_disputes 0.546243 |
|
moral_scenarios 0.241341 |
|
nutrition 0.562092 |
|
philosophy 0.517685 |
|
prehistory 0.518519 |
|
professional_accounting 0.414894 |
|
professional_law 0.365711 |
|
professional_medicine 0.367647 |
|
professional_psychology 0.446078 |
|
public_relations 0.536364 |
|
security_studies 0.538776 |
|
sociology 0.592040 |
|
us_foreign_policy 0.700000 |
|
virology 0.433735 |
|
world_religions 0.631579 |
|
INFO: 2024-11-15 17:07:11,433: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.434923 |
|
humanities 0.507447 |
|
other (business, health, misc.) 0.503999 |
|
social sciences 0.551074 |
|
INFO: 2024-11-15 17:07:11,438: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.4993605425093448} |
|
INFO: 2024-11-15 17:07:11,470: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 17:07:11,472: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.376 0.199 0.370 0.166 0.210 0.433 0.451 0.461 0.562 0.499 0.405 |
|
INFO: 2024-11-15 17:07:20,297: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] |
|
INFO: 2024-11-15 17:07:20,298: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-15 17:07:20,298: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-15 17:07:22,739: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.44s |
|
INFO: 2024-11-15 17:10:45,613: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 202.87s |
|
INFO: 2024-11-15 17:10:45,613: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-11-15 17:10:45,614: llmtf.base.darumeru/cp_para_ru: {'tokens_per_word': 1.8729616000663136, 'symbol_per_token': 4.043905520913176, 'len': 0.927490542336371, 'lcs': 0.05} |
|
INFO: 2024-11-15 17:10:45,615: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-15 17:10:45,615: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.346 0.199 0.370 0.166 0.210 0.433 0.451 0.050 0.461 0.562 0.499 0.405 |
|
|