tangledgroup
/

tangled-llama-33m-32k-base-v0.1

@@ -28,7 +28,9 @@ tags:
 A pretrained language model based on the Llama model with about **33M** parameters. This model has been trained on **9.7B** (`9,782,206,713`) tokens from more than **5.2M** (`5,285,575`) dataset rows.
-This model is designed not for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to **32K** (`32,768`) tokens, it was pretrained with sequences of **2K** (`2048`) tokens.
 [loss, val_loss](https://api.wandb.ai/links/mtasic85/z591qpyv)
@@ -40,6 +42,10 @@ This model is designed not for immediate use but rather for Continued Pretrainin
 ## lm-evaluation-harness
 |                           Tasks                           |Version|Filter|n-shot|        Metric         |   |Value |   |Stderr|
 |-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
 |leaderboard                                                |    N/A|      |      |                       |   |      |   |      |
@@ -90,3 +96,87 @@ This model is designed not for immediate use but rather for Continued Pretrainin
 |  - leaderboard_musr_object_placements                     |      1|none  |     0|acc_norm               |↑  |0.2930|±  |0.0285|
 |  - leaderboard_musr_team_allocation                       |      1|none  |     0|acc_norm               |↑  |0.3720|±  |0.0306|

 A pretrained language model based on the Llama model with about **33M** parameters. This model has been trained on **9.7B** (`9,782,206,713`) tokens from more than **5.2M** (`5,285,575`) dataset rows.
+This model **isn't** designed for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to **32K** (`32,768`) tokens, it was pretrained with sequences of **2K** (`2048`) tokens.
+The objective is to streamline the cognitive or reasoning core, eliminating any redundant knowledge from the model.
 [loss, val_loss](https://api.wandb.ai/links/mtasic85/z591qpyv)
 ## lm-evaluation-harness
+```bash
+litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-0/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
+```
 |                           Tasks                           |Version|Filter|n-shot|        Metric         |   |Value |   |Stderr|
 |-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
 |leaderboard                                                |    N/A|      |      |                       |   |      |   |      |
 |  - leaderboard_musr_object_placements                     |      1|none  |     0|acc_norm               |↑  |0.2930|±  |0.0285|
 |  - leaderboard_musr_team_allocation                       |      1|none  |     0|acc_norm               |↑  |0.3720|±  |0.0306|
+```bash
+litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-1/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
+```
+|                 Tasks                 |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
+|---------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
+|arc_challenge                          |      1|none            |     0|acc        |↑  |0.2048|±  |0.0118|
+|                                       |       |none            |     0|acc_norm   |↑  |0.2372|±  |0.0124|
+|gsm8k                                  |      3|flexible-extract|     5|exact_match|↑  |0.0091|±  |0.0026|
+|                                       |       |strict-match    |     5|exact_match|↑  |0.0000|±  |0.0000|
+|hellaswag                              |      1|none            |     0|acc        |↑  |0.2613|±  |0.0044|
+|                                       |       |none            |     0|acc_norm   |↑  |0.2605|±  |0.0044|
+|mmlu                                   |      2|none            |      |acc        |↑  |0.2280|±  |0.0035|
+| - humanities                          |      2|none            |      |acc        |↑  |0.2366|±  |0.0062|
+|  - formal_logic                       |      1|none            |     0|acc        |↑  |0.2778|±  |0.0401|
+|  - high_school_european_history       |      1|none            |     0|acc        |↑  |0.2061|±  |0.0316|
+|  - high_school_us_history             |      1|none            |     0|acc        |↑  |0.2402|±  |0.0300|
+|  - high_school_world_history          |      1|none            |     0|acc        |↑  |0.2827|±  |0.0293|
+|  - international_law                  |      1|none            |     0|acc        |↑  |0.2149|±  |0.0375|
+|  - jurisprudence                      |      1|none            |     0|acc        |↑  |0.2315|±  |0.0408|
+|  - logical_fallacies                  |      1|none            |     0|acc        |↑  |0.2209|±  |0.0326|
+|  - moral_disputes                     |      1|none            |     0|acc        |↑  |0.2370|±  |0.0229|
+|  - moral_scenarios                    |      1|none            |     0|acc        |↑  |0.2380|±  |0.0142|
+|  - philosophy                         |      1|none            |     0|acc        |↑  |0.1833|±  |0.0220|
+|  - prehistory                         |      1|none            |     0|acc        |↑  |0.2191|±  |0.0230|
+|  - professional_law                   |      1|none            |     0|acc        |↑  |0.2360|±  |0.0108|
+|  - world_religions                    |      1|none            |     0|acc        |↑  |0.3275|±  |0.0360|
+| - other                               |      2|none            |      |acc        |↑  |0.2485|±  |0.0077|
+|  - business_ethics                    |      1|none            |     0|acc        |↑  |0.3100|±  |0.0465|
+|  - clinical_knowledge                 |      1|none            |     0|acc        |↑  |0.2226|±  |0.0256|
+|  - college_medicine                   |      1|none            |     0|acc        |↑  |0.2486|±  |0.0330|
+|  - global_facts                       |      1|none            |     0|acc        |↑  |0.1700|±  |0.0378|
+|  - human_aging                        |      1|none            |     0|acc        |↑  |0.3229|±  |0.0314|
+|  - management                         |      1|none            |     0|acc        |↑  |0.1748|±  |0.0376|
+|  - marketing                          |      1|none            |     0|acc        |↑  |0.3034|±  |0.0301|
+|  - medical_genetics                   |      1|none            |     0|acc        |↑  |0.3100|±  |0.0465|
+|  - miscellaneous                      |      1|none            |     0|acc        |↑  |0.2414|±  |0.0153|
+|  - nutrition                          |      1|none            |     0|acc        |↑  |0.2484|±  |0.0247|
+|  - professional_accounting            |      1|none            |     0|acc        |↑  |0.2411|±  |0.0255|
+|  - professional_medicine              |      1|none            |     0|acc        |↑  |0.1838|±  |0.0235|
+|  - virology                           |      1|none            |     0|acc        |↑  |0.2831|±  |0.0351|
+| - social sciences                     |      2|none            |      |acc        |↑  |0.2158|±  |0.0074|
+|  - econometrics                       |      1|none            |     0|acc        |↑  |0.2368|±  |0.0400|
+|  - high_school_geography              |      1|none            |     0|acc        |↑  |0.1768|±  |0.0272|
+|  - high_school_government_and_politics|      1|none            |     0|acc        |↑  |0.1969|±  |0.0287|
+|  - high_school_macroeconomics         |      1|none            |     0|acc        |↑  |0.2103|±  |0.0207|
+|  - high_school_microeconomics         |      1|none            |     0|acc        |↑  |0.2143|±  |0.0267|
+|  - high_school_psychology             |      1|none            |     0|acc        |↑  |0.1890|±  |0.0168|
+|  - human_sexuality                    |      1|none            |     0|acc        |↑  |0.2672|±  |0.0388|
+|  - professional_psychology            |      1|none            |     0|acc        |↑  |0.2451|±  |0.0174|
+|  - public_relations                   |      1|none            |     0|acc        |↑  |0.2091|±  |0.0390|
+|  - security_studies                   |      1|none            |     0|acc        |↑  |0.1755|±  |0.0244|
+|  - sociology                          |      1|none            |     0|acc        |↑  |0.2488|±  |0.0306|
+|  - us_foreign_policy                  |      1|none            |     0|acc        |↑  |0.2700|±  |0.0446|
+| - stem                                |      2|none            |      |acc        |↑  |0.2068|±  |0.0072|
+|  - abstract_algebra                   |      1|none            |     0|acc        |↑  |0.1700|±  |0.0378|
+|  - anatomy                            |      1|none            |     0|acc        |↑  |0.1778|±  |0.0330|
+|  - astronomy                          |      1|none            |     0|acc        |↑  |0.1842|±  |0.0315|
+|  - college_biology                    |      1|none            |     0|acc        |↑  |0.2569|±  |0.0365|
+|  - college_chemistry                  |      1|none            |     0|acc        |↑  |0.1900|±  |0.0394|
+|  - college_computer_science           |      1|none            |     0|acc        |↑  |0.2600|±  |0.0441|
+|  - college_mathematics                |      1|none            |     0|acc        |↑  |0.2100|±  |0.0409|
+|  - college_physics                    |      1|none            |     0|acc        |↑  |0.2059|±  |0.0402|
+|  - computer_security                  |      1|none            |     0|acc        |↑  |0.2400|±  |0.0429|
+|  - conceptual_physics                 |      1|none            |     0|acc        |↑  |0.2681|±  |0.0290|
+|  - electrical_engineering             |      1|none            |     0|acc        |↑  |0.2345|±  |0.0353|
+|  - elementary_mathematics             |      1|none            |     0|acc        |↑  |0.2011|±  |0.0206|
+|  - high_school_biology                |      1|none            |     0|acc        |↑  |0.1839|±  |0.0220|
+|  - high_school_chemistry              |      1|none            |     0|acc        |↑  |0.1527|±  |0.0253|
+|  - high_school_computer_science       |      1|none            |     0|acc        |↑  |0.2400|±  |0.0429|
+|  - high_school_mathematics            |      1|none            |     0|acc        |↑  |0.2111|±  |0.0249|
+|  - high_school_physics                |      1|none            |     0|acc        |↑  |0.1722|±  |0.0308|
+|  - high_school_statistics             |      1|none            |     0|acc        |↑  |0.1667|±  |0.0254|
+|  - machine_learning                   |      1|none            |     0|acc        |↑  |0.2768|±  |0.0425|
+|truthfulqa_mc2                         |      2|none            |     0|acc        |↑  |0.4971|±  |0.0165|
+|winogrande                             |      1|none            |     0|acc        |↑  |0.5114|±  |0.0140|
+|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|------------------|------:|------|------|------|---|-----:|---|-----:|
+|mmlu              |      2|none  |      |acc   |↑  |0.2280|±  |0.0035|
+| - humanities     |      2|none  |      |acc   |↑  |0.2366|±  |0.0062|
+| - other          |      2|none  |      |acc   |↑  |0.2485|±  |0.0077|
+| - social sciences|      2|none  |      |acc   |↑  |0.2158|±  |0.0074|
+| - stem           |      2|none  |      |acc   |↑  |0.2068|±  |0.0072|

scripts/TRAIN.md CHANGED Viewed

@@ -57,7 +57,9 @@ model.save_pretrained('out/converted_model/')
 ## Evaluate
 ```bash
-# litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --batch_size 8 out/pretrain/final/
-litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,mmlu_pro,winogrande,arc_challenge,leaderboard,ifeval,mgsm_direct,mathqa,gpqa' --batch_size 8 out/pretrain/final/
 ```

 ## Evaluate
 ```bash
+litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-0/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
+litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-1/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
+litgpt evaluate --tasks 'mmlu_pro,ifeval,mgsm_direct,mathqa,gpqa' --out_dir 'evaluate-2/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
 ```