42dot
/

42dot_LLM-PLM-1.3B

@@ -25,11 +25,11 @@ license: cc-by-nc-4.0
 ### Pre-training
-Pre-training took 6 days using 256 * NVIDIA A100 GPUs. Related settings are listed below.
 | Params | Global batch size\* | Initial learning rate | Train iter.\* | Max length\* | Weight decay |
 | -- | -- | -- | -- | -- | -- |
-| 1.3B | 4.0M | 4E-4 | 1.0T | 2K | 0.1 |
 (\* unit: tokens)
@@ -51,11 +51,11 @@ We evaluate 42dot-PLM on a variety of academic benchmarks both on Korean and Eng
 |Tasks / Macro-F1|[KoGPT2](https://github.com/SKT-AI/KoGPT2) <br>1.2B|[Polyglot-Ko](https://github.com/EleutherAI/polyglot) <br>1.3B|[XGLM](https://huggingface.co/facebook/xglm-1.7B) <br>1.7B|[PolyLM](https://huggingface.co/DAMO-NLP-MT/polylm-1.7b) <br>1.7B|42dot-PLM <br>1.3B ko-en|
 |--------------|-----------|----------------|---------|-----------|------------------------|
-|boolq         |0.337      |0.355           |**0.502**    |0.334      |0.424                   |
-|copa          |0.67       |**0.721**           |0.616    |0.513      |0.698                   |
-|hellaswag     |0.404      |0.401           |0.374    |0.321      |**0.438**                   |
-|sentineg      |0.606      |0.679           |0.46     |0.382      |**0.74**                   |
-|**average**       |0.504      |0.539           |0.488    |0.388      |**0.575**                   |
 #### English
@@ -66,28 +66,28 @@ We evaluate 42dot-PLM on a variety of academic benchmarks both on Korean and Eng
 | Tasks / Metric         | [MPT](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) <br>1B | [OPT](https://huggingface.co/facebook/opt-1.3b) <br>1.3B | XGLM <br>1.7B | PolyLM <br>1.7B | 42dot-PLM <br>1.3B ko-en |
 | ---------------------- | ------ | -------- | --------- | ----------- | ------------------------ |
-| anli_r1/acc            | 0.309  | **0.341**    | 0.334     | 0.336       | 0.303                    |
-| anli_r2/acc            | 0.334  | **0.339**    | 0.331     | 0.314       | 0.337                    |
-| anli_r3/acc            | 0.33   | 0.336    | 0.333     | **0.339**       | 0.328                    |
-| arc_challenge/acc      | 0.268  | 0.234    | 0.21      | 0.198       | **0.287**                    |
-| arc_challenge/acc_norm | 0.291  | 0.295    | 0.243     | 0.256       | **0.319**                     |
-| arc_easy/acc           | 0.608  | 0.571    | 0.537     | 0.461       | **0.617**                    |
-| arc_easy/acc_norm      | **0.555**  | 0.51     | 0.479     | 0.404       | 0.548                    |
-| boolq/acc              | 0.517  | 0.578    | 0.585     | 0.617       | **0.62**                  |
-| hellaswag/acc          | **0.415**  | **0.415**    | 0.362     | 0.322       | 0.412                    |
-| hellaswag/acc_norm     | 0.532  | **0.537**    | 0.458     | 0.372       | 0.527                    |
-| openbookqa/acc         | **0.238**  | 0.234    | 0.17      | 0.166       | 0.212                    |
-| openbookqa/acc_norm    | **0.334**  | **0.334**    | 0.298     | **0.334**       | 0.328                    |
-| piqa/acc               | 0.714  | **0.718**    | 0.697     | 0.667       | 0.711                    |
-| piqa/acc_norm          | 0.72   | **0.724**    | 0.703     | 0.649       | 0.721                    |
-| record/f1              | 0.84   | **0.857**    | 0.775     | 0.681       | 0.841                    |
-| record/em              | 0.832  | **0.849**    | 0.769     | 0.674       | 0.834                    |
-| rte/acc                | 0.541  | 0.523    | **0.559**     | 0.513       | 0.524                    |
-| truthfulqa_mc/mc1      | 0.224  | 0.237    | 0.215     | **0.251**       | 0.247                    |
-| truthfulqa_mc/mc2      | 0.387  | 0.386    | 0.373     | **0.428**       | 0.392                    |
-| wic/acc                | 0.498  | **0.509**    | 0.503     | 0.5         | 0.502                    |
-| winogrande/acc         | 0.574  | **0.595**    | 0.55      | 0.519       | 0.579                    |
-| **avearge**                | 0.479  | 0.482    | 0.452     | 0.429       | **0.485**                    |
 ## Limitations and Ethical Considerations
 42dot-PLM shares a number of well-known limitations of other large language models (LLMs). For example, it may generate false and misinformative contents since 42dot-PLM is also subject to [hallucination](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)). In addition, 42dot-PLM may generate toxic, harmful and biased contents due to use of web-available training corpus. We strongly suggest that 42dot-PLM users should beware of those limitations and take necessary steps for mitigating those issues.
@@ -108,4 +108,4 @@ The 42dot-PLM is licensed under the Creative Commons Attribution-NonCommercial 4
       url = {https://gitlab.42dot.ai/NLP/hyperai/ChatBaker},
       version = {pre-release},
 }
-```

 ### Pre-training
+Pre-training took 9 days using 256 * NVIDIA A100 GPUs. Related settings are listed below.
 | Params | Global batch size\* | Initial learning rate | Train iter.\* | Max length\* | Weight decay |
 | -- | -- | -- | -- | -- | -- |
+| 1.3B | 4.0M | 4E-4 | 1.0T | 8K | 0.1 |
 (\* unit: tokens)
 |Tasks / Macro-F1|[KoGPT2](https://github.com/SKT-AI/KoGPT2) <br>1.2B|[Polyglot-Ko](https://github.com/EleutherAI/polyglot) <br>1.3B|[XGLM](https://huggingface.co/facebook/xglm-1.7B) <br>1.7B|[PolyLM](https://huggingface.co/DAMO-NLP-MT/polylm-1.7b) <br>1.7B|42dot-PLM <br>1.3B ko-en|
 |--------------|-----------|----------------|---------|-----------|------------------------|
+|boolq         |0.337      |0.355           |**0.502**    |0.334      |0.351                   |
+|copa          |0.67       |**0.721**           |0.616    |0.513      |0.711                   |
+|hellaswag     |0.404      |0.401           |0.374    |0.321      |**0.437**                   |
+|sentineg      |0.606      |0.679           |0.46     |0.382      |**0.711**                   |
+|**average**       |0.504      |0.539           |0.488    |0.388      |**0.553**                   |
 #### English
 | Tasks / Metric         | [MPT](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) <br>1B | [OPT](https://huggingface.co/facebook/opt-1.3b) <br>1.3B | XGLM <br>1.7B | PolyLM <br>1.7B | 42dot-PLM <br>1.3B ko-en |
 | ---------------------- | ------ | -------- | --------- | ----------- | ------------------------ |
+| anli_r1/acc            | 0.309  | **0.341**    | 0.334     | 0.336       | 0.328                    |
+| anli_r2/acc            | 0.334  | **0.339**    | 0.331     | 0.314       | 0.334                    |
+| anli_r3/acc            | 0.33   | 0.336    | 0.333     | **0.339**       | 0.333                    |
+| arc_challenge/acc      | 0.268  | 0.234    | 0.21      | 0.198       | **0.282**                    |
+| arc_challenge/acc_norm | 0.291  | 0.295    | 0.243     | 0.256       | **0.314**                     |
+| arc_easy/acc           | 0.608  | 0.571    | 0.537     | 0.461       | **0.623**                    |
+| arc_easy/acc_norm      | 0.555  | 0.51     | 0.479     | 0.404       | **0.561**                    |
+| boolq/acc              | 0.517  | 0.578    | 0.585     | 0.617       | **0.628**                  |
+| hellaswag/acc          | 0.415  | 0.415    | 0.362     | 0.322       | **0.419**                    |
+| hellaswag/acc_norm     | 0.532  | 0.537    | 0.458     | 0.372       | **0.538**                    |
+| openbookqa/acc         | **0.238**  | 0.234    | 0.17      | 0.166       | 0.234                    |
+| openbookqa/acc_norm    | **0.334**  | **0.334**    | 0.298     | **0.334**       | 0.332                    |
+| piqa/acc               | 0.714  | **0.718**    | 0.697     | 0.667       | **0.718**                    |
+| piqa/acc_norm          | 0.72   | **0.724**    | 0.703     | 0.649       | **0.724**                    |
+| record/f1              | 0.84   | **0.857**    | 0.775     | 0.681       | 0.85                    |
+| record/em              | 0.832  | **0.849**    | 0.769     | 0.674       | 0.841                    |
+| rte/acc                | 0.541  | 0.523    | **0.559**     | 0.513       | 0.516                    |
+| truthfulqa_mc/mc1      | 0.224  | 0.237    | 0.215     | **0.251**       | 0.234                    |
+| truthfulqa_mc/mc2      | 0.387  | 0.386    | 0.373     | **0.428**       | 0.382                    |
+| wic/acc                | 0.498  | **0.509**    | 0.503     | 0.5         | 0.503                    |
+| winogrande/acc         | 0.574  | **0.595**    | 0.55      | 0.519       | 0.575                    |
+| **avearge**                | 0.479  | 0.482    | 0.452     | 0.429       | **0.489**                    |
 ## Limitations and Ethical Considerations
 42dot-PLM shares a number of well-known limitations of other large language models (LLMs). For example, it may generate false and misinformative contents since 42dot-PLM is also subject to [hallucination](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)). In addition, 42dot-PLM may generate toxic, harmful and biased contents due to use of web-available training corpus. We strongly suggest that 42dot-PLM users should beware of those limitations and take necessary steps for mitigating those issues.
       url = {https://gitlab.42dot.ai/NLP/hyperai/ChatBaker},
       version = {pre-release},
 }
+```