yhavinga
/

gpt-neo-1.3B-dutch

@@ -53,17 +53,15 @@ which is the original mC4, except
 TL;DR: [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch) is the best model.
-* `yhavinga/gpt-neo-125M-dutch` is trained on a fraction of C4 containing only wikipedia and news sites.
 * The models with `a`/`b` in the step-column have been trained to step `a` of a total of `b` steps.
 |                                                                                   | model   | params | train seq len | ppl  | loss | batch size | epochs | steps           | optim     | lr     | duration | config    |
 |-----------------------------------------------------------------------------------|---------|--------|---------------|------|------|------------|--------|-----------------|-----------|--------|----------|-----------|
-| [yhavinga/gpt-neo-125M-dutch](https://huggingface.co/yhavinga/gpt-neo-125M-dutch) | gpt neo | 125M   | 512           | 19.9 | 2.99 | 128        | 8      | 558608          | adamw     | 2.4e-3 | 1d 12h   | news+wiki |
 | [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch)   | gpt2    | 345M   | 512           | 15.1 | 2.71 | 128        | 4      | 320000/520502   | adafactor | 8e-4   | 7d 2h    | full      |
 | [yhavinga/gpt2-large-dutch](https://huggingface.co/yhavinga/gpt2-large-dutch)     | gpt2    | 762M   | 512           | 15.1 | 2.72 | 32         | 1      | 1100000/2082009 | adafactor | 3.3e-5 | 8d 15h   | large     |
 | [yhavinga/gpt-neo-1.3B-dutch](https://huggingface.co/yhavinga/gpt-neo-1.3B-dutch) | gpt neo | 1.3B   | 512           | 16.0 | 2.77 | 16         | 1      | 960000/3049896  | adafactor | 5e-4   | 7d 11h   | full      |
 ## Acknowledgements
 This project would not have been possible without compute generously provided by Google through the

 TL;DR: [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch) is the best model.
 * The models with `a`/`b` in the step-column have been trained to step `a` of a total of `b` steps.
 |                                                                                   | model   | params | train seq len | ppl  | loss | batch size | epochs | steps           | optim     | lr     | duration | config    |
 |-----------------------------------------------------------------------------------|---------|--------|---------------|------|------|------------|--------|-----------------|-----------|--------|----------|-----------|
+| [yhavinga/gpt-neo-125M-dutch](https://huggingface.co/yhavinga/gpt-neo-125M-dutch) | gpt neo | 125M   | 512           | 20.9 | 3.04 | 128        | 1      | 190000/558608          | adafactor | 2.4e-3 | 1d 12h   | full |
 | [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch)   | gpt2    | 345M   | 512           | 15.1 | 2.71 | 128        | 4      | 320000/520502   | adafactor | 8e-4   | 7d 2h    | full      |
 | [yhavinga/gpt2-large-dutch](https://huggingface.co/yhavinga/gpt2-large-dutch)     | gpt2    | 762M   | 512           | 15.1 | 2.72 | 32         | 1      | 1100000/2082009 | adafactor | 3.3e-5 | 8d 15h   | large     |
 | [yhavinga/gpt-neo-1.3B-dutch](https://huggingface.co/yhavinga/gpt-neo-1.3B-dutch) | gpt neo | 1.3B   | 512           | 16.0 | 2.77 | 16         | 1      | 960000/3049896  | adafactor | 5e-4   | 7d 11h   | full      |
 ## Acknowledgements
 This project would not have been possible without compute generously provided by Google through the