ykhwang commited on
Commit
4a8aed7
·
1 Parent(s): 29bb3ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -30
README.md CHANGED
@@ -25,11 +25,11 @@ license: cc-by-nc-4.0
25
 
26
  ### Pre-training
27
 
28
- Pre-training took 9 days using 256 * NVIDIA A100 GPUs. Related settings are listed below.
29
 
30
  | Params | Global batch size\* | Initial learning rate | Train iter.\* | Max length\* | Weight decay |
31
  | -- | -- | -- | -- | -- | -- |
32
- | 1.3B | 4.0M | 4E-4 | 1.4T | 8,192 | 0.1 |
33
 
34
  (\* unit: tokens)
35
 
@@ -51,12 +51,11 @@ We evaluate 42dot-PLM on a variety of academic benchmarks both in Korean and Eng
51
 
52
  |Tasks / Macro-F1|[KoGPT2](https://github.com/SKT-AI/KoGPT2) <br>1.2B|[Polyglot-Ko](https://github.com/EleutherAI/polyglot) <br>1.3B|[XGLM](https://huggingface.co/facebook/xglm-1.7B) <br>1.7B|[PolyLM](https://huggingface.co/DAMO-NLP-MT/polylm-1.7b) <br>1.7B|42dot-PLM <br>1.3B|
53
  |--------------|-----------|----------------|---------|-----------|------------------------|
54
- |boolq |0.337 |0.355 |**0.502** |0.334 |0.351 |
55
- |copa |0.67 |**0.721** |0.616 |0.513 |0.711 |
56
- |hellaswag |0.404 |0.401 |0.374 |0.321 |**0.437** |
57
- |sentineg |0.606 |0.679 |0.46 |0.382 |**0.711** |
58
- |**average** |0.504 |0.539 |0.488 |0.388 |**0.553** |
59
-
60
 
61
  #### English
62
 
@@ -64,30 +63,30 @@ We evaluate 42dot-PLM on a variety of academic benchmarks both in Korean and Eng
64
  <img src="https://huggingface.co/42dot/42dot-plm-1.3b/resolve/main/asset/plm_benchmark_en.png" width="90%" height="90%"/>
65
  </figure>
66
 
67
- | Tasks / Metric | [MPT](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) <br>1B | [OPT](https://huggingface.co/facebook/opt-1.3b) <br>1.3B | XGLM <br>1.7B | PolyLM <br>1.7B | 42dot-PLM <br>1.3B |
68
  | ---------------------- | ------ | -------- | --------- | ----------- | ------------------------ |
69
- | anli_r1/acc | 0.309 | **0.341** | 0.334 | 0.336 | 0.328 |
70
- | anli_r2/acc | 0.334 | **0.339** | 0.331 | 0.314 | 0.334 |
71
  | anli_r3/acc | 0.33 | 0.336 | 0.333 | **0.339** | 0.333 |
72
- | arc_challenge/acc | 0.268 | 0.234 | 0.21 | 0.198 | **0.282** |
73
- | arc_challenge/acc_norm | 0.291 | 0.295 | 0.243 | 0.256 | **0.314** |
74
- | arc_easy/acc | 0.608 | 0.571 | 0.537 | 0.461 | **0.623** |
75
- | arc_easy/acc_norm | 0.555 | 0.51 | 0.479 | 0.404 | **0.561** |
76
- | boolq/acc | 0.517 | 0.578 | 0.585 | 0.617 | **0.628** |
77
- | hellaswag/acc | 0.415 | 0.415 | 0.362 | 0.322 | **0.419** |
78
- | hellaswag/acc_norm | 0.532 | 0.537 | 0.458 | 0.372 | **0.538** |
79
- | openbookqa/acc | **0.238** | 0.234 | 0.17 | 0.166 | 0.234 |
80
- | openbookqa/acc_norm | **0.334** | **0.334** | 0.298 | **0.334** | 0.332 |
81
- | piqa/acc | 0.714 | **0.718** | 0.697 | 0.667 | **0.718** |
82
- | piqa/acc_norm | 0.72 | **0.724** | 0.703 | 0.649 | **0.724** |
83
- | record/f1 | 0.84 | **0.857** | 0.775 | 0.681 | 0.85 |
84
- | record/em | 0.832 | **0.849** | 0.769 | 0.674 | 0.841 |
85
- | rte/acc | 0.541 | 0.523 | **0.559** | 0.513 | 0.516 |
86
- | truthfulqa_mc/mc1 | 0.224 | 0.237 | 0.215 | **0.251** | 0.234 |
87
- | truthfulqa_mc/mc2 | 0.387 | 0.386 | 0.373 | **0.428** | 0.382 |
88
- | wic/acc | 0.498 | **0.509** | 0.503 | 0.5 | 0.503 |
89
- | winogrande/acc | 0.574 | **0.595** | 0.55 | 0.519 | 0.575 |
90
- | **avearge** | 0.479 | 0.482 | 0.452 | 0.429 | **0.489** |
91
 
92
  ## Limitations and Ethical Considerations
93
  42dot-PLM shares a number of well-known limitations of other large language models (LLMs). For example, it may generate false and misinformative content since 42dot-PLM is also subject to [hallucination](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)). In addition, 42dot-PLM may generate toxic, harmful, and biased content due to the use of web-available training data. We strongly suggest that 42dot-PLM users should be aware of those limitations and take necessary steps to mitigate those issues.
 
25
 
26
  ### Pre-training
27
 
28
+ Pre-training took 8 days using 256 * NVIDIA A100 GPUs. Related settings are listed below.
29
 
30
  | Params | Global batch size\* | Initial learning rate | Train iter.\* | Max length\* | Weight decay |
31
  | -- | -- | -- | -- | -- | -- |
32
+ | 1.3B | 4.0M | 4E-4 | 1.4T | 4,096 | 0.1 |
33
 
34
  (\* unit: tokens)
35
 
 
51
 
52
  |Tasks / Macro-F1|[KoGPT2](https://github.com/SKT-AI/KoGPT2) <br>1.2B|[Polyglot-Ko](https://github.com/EleutherAI/polyglot) <br>1.3B|[XGLM](https://huggingface.co/facebook/xglm-1.7B) <br>1.7B|[PolyLM](https://huggingface.co/DAMO-NLP-MT/polylm-1.7b) <br>1.7B|42dot-PLM <br>1.3B|
53
  |--------------|-----------|----------------|---------|-----------|------------------------|
54
+ |boolq |0.337 |0.355 |**0.502** |0.334 |0.369 |
55
+ |copa |0.67 |**0.721** |0.616 |0.513 |0.704 |
56
+ |hellaswag |0.404 |0.401 |0.374 |0.321 |**0.431** |
57
+ |sentineg |0.606 |0.679 |0.46 |0.382 |**0.69** |
58
+ |**average** |0.504 |0.539 |0.488 |0.388 |**0.549** |
 
59
 
60
  #### English
61
 
 
63
  <img src="https://huggingface.co/42dot/42dot-plm-1.3b/resolve/main/asset/plm_benchmark_en.png" width="90%" height="90%"/>
64
  </figure>
65
 
66
+ | Tasks / Metric | MPT <br>1B | OPT <br>1.3B | XGLM <br>1.7B | PolyLM <br>1.7B | 42dot-PLM <br>1.3B |
67
  | ---------------------- | ------ | -------- | --------- | ----------- | ------------------------ |
68
+ | anli_r1/acc | 0.309 | **0.341** | 0.334 | 0.336 | 0.325 |
69
+ | anli_r2/acc | 0.334 | 0.339 | 0.331 | 0.314 | **0.34** |
70
  | anli_r3/acc | 0.33 | 0.336 | 0.333 | **0.339** | 0.333 |
71
+ | arc_challenge/acc | 0.268 | 0.234 | 0.21 | 0.198 | **0.288** |
72
+ | arc_challenge/acc_norm | 0.291 | 0.295 | 0.243 | 0.256 | **0.317** |
73
+ | arc_easy/acc | 0.608 | 0.571 | 0.537 | 0.461 | **0.628** |
74
+ | arc_easy/acc_norm | 0.555 | 0.51 | 0.479 | 0.404 | **0.564** |
75
+ | boolq/acc | 0.517 | 0.578 | 0.585 | 0.617 | **0.624** |
76
+ | hellaswag/acc | 0.415 | 0.415 | 0.362 | 0.322 | **0.422** |
77
+ | hellaswag/acc_norm | 0.532 | 0.537 | 0.458 | 0.372 | **0.544** |
78
+ | openbookqa/acc | **0.238** | 0.234 | 0.17 | 0.166 | 0.222 |
79
+ | openbookqa/acc_norm | 0.334 | 0.334 | 0.298 | 0.334 | **0.34** |
80
+ | piqa/acc | 0.714 | 0.718 | 0.697 | 0.667 | **0.725** |
81
+ | piqa/acc_norm | 0.72 | 0.724 | 0.703 | 0.649 | **0.727** |
82
+ | record/f1 | 0.84 | **0.857** | 0.775 | 0.681 | 0.848 |
83
+ | record/em | 0.832 | **0.849** | 0.769 | 0.674 | 0.839 |
84
+ | rte/acc | 0.541 | 0.523 | **0.559** | 0.513 | 0.542 |
85
+ | truthfulqa_mc/mc1 | 0.224 | 0.237 | 0.215 | **0.251** | 0.236 |
86
+ | truthfulqa_mc/mc2 | 0.387 | 0.386 | 0.373 | **0.428** | 0.387 |
87
+ | wic/acc | 0.498 | **0.509** | 0.503 | 0.5 | 0.502 |
88
+ | winogrande/acc | 0.574 | **0.595** | 0.55 | 0.519 | 0.583 |
89
+ | **avearge** | 0.479 | 0.482 | 0.452 | 0.429 | **0.492** |
90
 
91
  ## Limitations and Ethical Considerations
92
  42dot-PLM shares a number of well-known limitations of other large language models (LLMs). For example, it may generate false and misinformative content since 42dot-PLM is also subject to [hallucination](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)). In addition, 42dot-PLM may generate toxic, harmful, and biased content due to the use of web-available training data. We strongly suggest that 42dot-PLM users should be aware of those limitations and take necessary steps to mitigate those issues.