This model trains T5-V1_1-base on the Norwegian dataset of Oscar. Note that the original configuration is slightly change (dropout is set to 0).
The official run_t5_mlm_flax.py is copied into the repository and is run using the hyperparameters as defined in run_t5.sh.
Training loss can be seen directly on the model card. The full training runs in finished in ca. 4 hours and 30 minutes.