|
--- |
|
language: nl |
|
widget: |
|
- text: "In het jaar 2030 zullen we" |
|
- text: "Toen ik gisteren volledig in de ban was van" |
|
- text: "Studenten en leraren van de Bogazici Universiteit in de Turkse stad Istanbul" |
|
- text: "In Israël was een strenge lockdown" |
|
tags: |
|
- gpt2-large |
|
- gpt2 |
|
pipeline_tag: text-generation |
|
datasets: |
|
- yhavinga/mc4_nl_cleaned |
|
--- |
|
# GPT2-Large pre-trained on cleaned Dutch mC4 🇳🇱 |
|
|
|
Dataset: |
|
|
|
* [mC4 NL Cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned) |
|
* dataset config: full (33B tokens) |
|
|
|
Tokenizer: |
|
|
|
* Tokenizer trained on mC4 with scripts from the Huggingface |
|
Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling) |
|
|
|
Training details: |
|
|
|
* Training started on step 360K (bs 16) ppl 21 of earlier model trained with Adam optimizer. |
|
* Training at step 800K of 2M (38%) ppl 15,3[D |
|
* Block size: 512 |
|
* Optimizer: adafactor |
|
* Learning rate: 3.3e-5 |
|
* Batch size: 32 |
|
* Warmup steps: 5000 |
|
* Weight decay: 0.01 |
|
|
|
Work in progress. Dec 2021-Jan2022 |
|
|
|
* Many thanks to the [Google TPU Research Cloud](https://sites.research.google/trc/about/) for providing access to a TPU cluster! |
|
* Thanks to @gsarti for creating the [t5-flax-gcp |
|
repository](https://github.com/gsarti/t5-flax-gcp). |
|
* Also thanks to the creators of [gpt2-medium-persian](https://huggingface.co/flax-community/gpt2-medium-persian) and |
|
[gpt2-medium-indonesian](https://huggingface.co/flax-community/gpt2-medium-persian) |
|
for sharing their training scripts! |
|
|