Add files

Browse files

Files changed (14) hide show

README.md +185 -0
added_tokens.json +1 -0
config.gin +150 -0
config.json +31 -0
flax_model.msgpack +3 -0
model-info.txt +0 -0
pytorch_model.bin +3 -0
special_tokens_map.json +107 -0
spiece.model +3 -0
spiece.vocab +0 -0
tokenizer_config.json +113 -0
train/events.out.tfevents.1669658385.t1v-n-a765f9c4-w-0.2471217.0.v2 +3 -0
training_eval/mc4_nl_ul2_denoising/events.out.tfevents.1669658385.t1v-n-a765f9c4-w-0.2471217.1.v2 +3 -0
training_eval/ul2_mc4_nedd_wiki_news_mix_1/events.out.tfevents.1669658385.t1v-n-a765f9c4-w-0.2471217.2.v2 +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,185 @@

+---
+language:
+- nl
+license: apache-2.0
+tags:
+- dutch
+- t5
+- t5x
+- ul2
+- seq2seq
+datasets:
+- yhavinga/mc4_nl_cleaned
+- yhavinga/nedd_wiki_news
+inference: false
+---
+# ul2-base-dutch for Dutch
+Pretrained T5 model on Dutch using a UL2 (Mixture-of-Denoisers) objective.
+The T5 model was introduced in
+[this paper](https://arxiv.org/abs/1910.10683)
+and first released at [this page](https://github.com/google-research/text-to-text-transfer-transformer).
+The UL2 objective was introduced in
+[this paper](https://arxiv.org/abs/2205.05131)
+and first released at [this page](https://github.com/google-research/google-research/tree/master/ul2).
+**Note:** The Hugging Face inference widget is deactivated because this model needs a text-to-text fine-tuning on
+a specific downstream task to be useful in practice.
+## Model description
+T5 is an encoder-decoder model and treats all NLP problems in a text-to-text format.
+`ul2-base-dutch` T5 is a transformers model pretrained on a very large corpus of
+Dutch  data in a self-supervised fashion.
+This means it was pretrained on the raw texts only, with no humans labelling them in any way
+(which is why it can use lots of publicly available data) with an automatic process to generate
+inputs and outputs from those texts.
+This model used the [T5 v1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) improvements compared to the original T5 model during the pretraining:
+- GEGLU activation in the feed-forward hidden layer, rather than ReLU - see [here](https://arxiv.org/abs/2002.05202)
+- Dropout was turned off during pre-training. Dropout should be re-enabled during fine-tuning
+- Pre-trained on self-supervised objective only without mixing in the downstream tasks
+- No parameter sharing between embedding and classifier layer
+### UL2 pretraining objective
+This model was pretrained with the UL2's Mixture-of-Denoisers (MoD) objective, that combines diverse pre-training
+paradigms together. UL2 frames different objective functions for training language models as denoising tasks, where
+the model has to recover missing sub-sequences of a given input. During pre-training it uses a novel mixture-of-denoisers
+that samples from a varied set of such objectives, each with different configurations. UL2 is trained using a mixture of
+three denoising tasks:
+1. R-denoising (or regular span corruption), which emulates the standard T5 span corruption objective;
+2. X-denoising (or extreme span corruption); and
+3. S-denoising (or sequential PrefixLM).
+During pre-training, we sample from the available denoising tasks based on user-specified ratios.
+UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training
+denoising task. During the pre-training, a paradigm token is inserted to the input
+(`[NLU]` for R-denoising, `[NLG]` for X-denoising, or `[S2S]` for S-denoising) indicating the denoising task at hand.
+Then, during fine-tuning the same input token should be inserted to get the best performance for different downstream
+fine-tuning tasks.
+## Intended uses & limitations
+This model was only pretrained in a self-supervised way excluding any supervised training.
+Therefore, this model has to be fine-tuned before it is usable on a downstream task,
+like text classification, unlike the Google's original T5 model.
+**Note:** You most likely need to fine-tune these T5/UL2 models without mixed precision
+so fine-tune them with full fp32 precision. Fine-tuning with Flax in bf16 - `model.to_bf16()` - is possible
+if you set the mask correctly to exclude layernorm and embedding layers. Also note that the T5x pre-training
+and fine-tuning configs set `z_loss` to 1e-4, which is used to keep the loss scale from underflowing.
+You can also find more fine-tuning tips from [here](https://discuss.huggingface.co/t/t5-finetuning-tips), for example.
+**Note**: For fine-tuning, most likely you can get better results if you insert a prefix token
+of `[NLU]`, `[NLG]`, or `[S2S]` to your input texts.
+For general language understanding fine-tuning tasks, you could use the `[NLU]` token.
+For GPT-style causal language generation, you could use the `[S2S]` token.
+The token `[NLG]` of the X-denoising pretrain task is somewhat mix between the language understanding and causal language
+generation so the token `[NLG]` could maybe be used for language generation fine-tuning too.
+### How to use
+Here is how to use this model in PyTorch:
+```python
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("yhavinga/ul2-base-dutch", use_fast=False)
+model = T5ForConditionalGeneration.from_pretrained("yhavinga/ul2-base-dutch")
+```
+and in Flax:
+```python
+from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("yhavinga/ul2-base-dutch", use_fast=False)
+model = FlaxT5ForConditionalGeneration.from_pretrained("yhavinga/ul2-base-dutch")
+```
+### Limitations and bias
+The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral.
+Therefore, the model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
+## Training data
+The `ul2-base-dutch` T5 model was pre-trained simultaneously on a combination of several datasets,
+including the full version of the "mc4_nl_cleaned" dataset, which is a cleaned version of Common Crawl's web
+crawl corpus, Dutch books, the Dutch subset of Wikipedia (2022-03-20), and a subset of "mc4_nl_cleaned"
+containing only texts from newspapers.
+## Training procedure
+### Preprocessing
+The ul2-base-dutch T5 model uses a SentencePiece unigram tokenizer with a vocabulary of 32,000 tokens.
+The tokenizer includes the special tokens `<pad>`, `</s>`, `<unk>`,  known from the original T5 paper,
+`[NLU]`, `[NLG]` and `[S2S]` for the MoD pre-training, and `<n>` for newline.
+During pre-training with the UL2 objective, input and output sequences consist of 512 consecutive tokens.
+The tokenizer does not lowercase texts and is therefore case-sensitive; it distinguises
+between `dutch` and `Dutch`.
+Additionally, 100+28 extra tokens were added for pre-training tasks, resulting in a total of 32,128 tokens.
+### Pretraining
+The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
+for 1000000 steps with a batch size of 128
+(in total 65 B tokens).
+The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
+and then an inverse square root decay (exponential decay) of the learning rate after.
+The model was trained with Google's Jax/Flax based [t5x framework](https://github.com/google-research/t5x) with help
+from [Stephenn Fernandes](https://huggingface.co/StephennFernandes) to get started writing task definitions that wrap
+HF datasets.
+The UL2 training objective code used with the [t5x framework](https://github.com/google-research/t5x) was copied and
+slightly modified from the [UL2 paper](https://arxiv.org/pdf/2205.05131.pdf) appendix chapter 9.2 by the authors
+of the Finnish ul2 models. Used UL2 objective code is available in the repository
+[Finnish-NLP/ul2-base-nl36-finnish](https://huggingface.co/Finnish-NLP/ul2-base-nl36-finnish) in the files `ul2_objective.py` and `tasks.py`.
+UL2's mixture-of-denoisers configuration was otherwise equal to the UL2 paper
+but for the rate of mixing denoisers, 20% for S-denoising was used (suggested at the paper chapter 4.5)
+and the rest was divided equally between the R-denoising and X-denoising (i.e. 40% for both).
+### Model list
+Models in this series:
+|                      | ul2-base-dutch       | ul2-base-nl36-dutch   | ul2-large-dutch      | ul2-small-dutch      |
+|:---------------------|:---------------------|:----------------------|:---------------------|:---------------------|
+| model_type           | t5                   | t5                    | t5                   | t5                   |
+| _pipeline_tag        | text2text-generation | text2text-generation  | text2text-generation | text2text-generation |
+| d_model              | 768                  | 768                   | 1024                 | 512                  |
+| d_ff                 | 2048                 | 3072                  | 2816                 | 1024                 |
+| num_heads            | 12                   | 12                    | 16                   | 6                    |
+| d_kv                 | 64                   | 64                    | 64                   | 64                   |
+| num_layers           | 12                   | 36                    | 24                   | 8                    |
+| num_decoder_layers   | 12                   | 36                    | 24                   | 8                    |
+| feed_forward_proj    | gated-gelu           | gated-gelu            | gated-gelu           | gated-gelu           |
+| dense_act_fn         | gelu_new             | gelu_new              | gelu_new             | gelu_new             |
+| vocab_size           | 32128                | 32128                 | 32128                | 32128                |
+| tie_word_embeddings  | 0                    | 0                     | 0                    | 0                    |
+| torch_dtype          | float32              | float32               | float32              | float32              |
+| _gin_batch_size      | 128                  | 64                    | 64                   | 128                  |
+| _gin_z_loss          | 0.0001               | 0.0001                | 0.0001               | 0.0001               |
+| _gin_t5_config_dtype | 'bfloat16'           | 'bfloat16'            | 'bfloat16'           | 'bfloat16'           |
+## Evaluation results
+See the evaluation section in the interactive [Pre-training Dutch T5 Models](https://huggingface.co/spaces/yhavinga/pre-training-dutch-t5-models) blog.
+## Acknowledgements
+This project would not have been possible without compute generously provided by Google through the
+[TPU Research Cloud](https://sites.research.google/trc/).
+Thanks to the [Finnish-NLP](https://huggingface.co/Finnish-NLP) authors for releasing their code for the UL2 objective and associated task definitions.
+Thanks to [Stephenn Fernandes](https://huggingface.co/StephennFernandes) for helping me get started with the t5x framework.
+Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)

added_tokens.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"[new_id_17]": 32117, "[new_id_20]": 32120, "[new_id_13]": 32113, "[new_id_2]": 32102, "[new_id_16]": 32116, "[new_id_7]": 32107, "[new_id_5]": 32105, "[new_id_1]": 32101, "[new_id_15]": 32115, "[new_id_12]": 32112, "[new_id_0]": 32100, "[new_id_11]": 32111, "[new_id_25]": 32125, "[new_id_24]": 32124, "[new_id_10]": 32110, "[new_id_27]": 32127, "[new_id_23]": 32123, "[new_id_14]": 32114, "[new_id_22]": 32122, "[new_id_21]": 32121, "[new_id_19]": 32119, "[new_id_3]": 32103, "[new_id_4]": 32104, "[new_id_18]": 32118, "[new_id_9]": 32109, "[new_id_8]": 32108, "[new_id_26]": 32126, "[new_id_6]": 32106}

config.gin ADDED Viewed

	@@ -0,0 +1,150 @@

+from __gin__ import dynamic_registration
+import __main__ as train_script
+import seqio
+import t5.data.mixtures
+from t5x import adafactor
+from t5x.examples.t5 import network
+from t5x import gin_utils
+from t5x import models
+from t5x import partitioning
+from t5x import trainer
+from t5x import utils
+import tasks.nedd_tasks
+import tasks.ul2_tasks as tasks2
+# Macros:
+# ==============================================================================
+BATCH_SIZE = 128
+DROPOUT_RATE = 0.0
+LABEL_SMOOTHING = 0.0
+LOSS_NORMALIZING_FACTOR = None
+MIXTURE_OR_TASK_MODULE = None
+MIXTURE_OR_TASK_NAME = 'ul2_mc4_nedd_wiki_news_mix_1'
+MODEL = @models.EncoderDecoderModel()
+MODEL_DIR = 'ul2_base_mc4_nedd_wiki_news_nl'
+OPTIMIZER = @adafactor.Adafactor()
+RANDOM_SEED = None
+SHUFFLE_TRAIN_EXAMPLES = True
+TASK_FEATURE_LENGTHS = {'inputs': 512, 'targets': 512}
+TRAIN_STEPS = 1000000
+USE_CACHED_TASKS = False
+USE_HARDWARE_RNG = False
+VOCABULARY = @seqio.SentencePieceVocabulary()
+Z_LOSS = 0.0001
+# Parameters for adafactor.Adafactor:
+# ==============================================================================
+adafactor.Adafactor.decay_rate = 0.8
+adafactor.Adafactor.logical_factor_rules = \
+    @adafactor.standard_logical_factor_rules()
+adafactor.Adafactor.step_offset = 0
+# Parameters for utils.CheckpointConfig:
+# ==============================================================================
+utils.CheckpointConfig.restore = @utils.RestoreCheckpointConfig()
+utils.CheckpointConfig.save = @utils.SaveCheckpointConfig()
+# Parameters for utils.create_learning_rate_scheduler:
+# ==============================================================================
+utils.create_learning_rate_scheduler.base_learning_rate = 1.0
+utils.create_learning_rate_scheduler.factors = 'constant * rsqrt_decay'
+utils.create_learning_rate_scheduler.warmup_steps = 10000
+# Parameters for train/utils.DatasetConfig:
+# ==============================================================================
+train/utils.DatasetConfig.batch_size = %BATCH_SIZE
+train/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
+train/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
+train/utils.DatasetConfig.pack = True
+train/utils.DatasetConfig.seed = None
+train/utils.DatasetConfig.shuffle = %SHUFFLE_TRAIN_EXAMPLES
+train/utils.DatasetConfig.split = 'train'
+train/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
+train/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
+# Parameters for train_eval/utils.DatasetConfig:
+# ==============================================================================
+train_eval/utils.DatasetConfig.batch_size = %BATCH_SIZE
+train_eval/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
+train_eval/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
+train_eval/utils.DatasetConfig.pack = True
+train_eval/utils.DatasetConfig.seed = 42
+train_eval/utils.DatasetConfig.shuffle = False
+train_eval/utils.DatasetConfig.split = 'validation'
+train_eval/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
+train_eval/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
+# Parameters for models.EncoderDecoderModel:
+# ==============================================================================
+models.EncoderDecoderModel.input_vocabulary = %VOCABULARY
+models.EncoderDecoderModel.label_smoothing = %LABEL_SMOOTHING
+models.EncoderDecoderModel.loss_normalizing_factor = %LOSS_NORMALIZING_FACTOR
+models.EncoderDecoderModel.module = @network.Transformer()
+models.EncoderDecoderModel.optimizer_def = %OPTIMIZER
+models.EncoderDecoderModel.output_vocabulary = %VOCABULARY
+models.EncoderDecoderModel.z_loss = %Z_LOSS
+# Parameters for partitioning.PjitPartitioner:
+# ==============================================================================
+partitioning.PjitPartitioner.logical_axis_rules = \
+    @partitioning.standard_logical_axis_rules()
+partitioning.PjitPartitioner.model_parallel_submesh = None
+partitioning.PjitPartitioner.num_partitions = 1
+# Parameters for utils.RestoreCheckpointConfig:
+# ==============================================================================
+utils.RestoreCheckpointConfig.path = []
+# Parameters for utils.SaveCheckpointConfig:
+# ==============================================================================
+utils.SaveCheckpointConfig.dtype = 'float32'
+utils.SaveCheckpointConfig.keep = 4
+utils.SaveCheckpointConfig.period = 50000
+utils.SaveCheckpointConfig.save_dataset = False
+utils.SaveCheckpointConfig.use_gda = False
+# Parameters for seqio.SentencePieceVocabulary:
+# ==============================================================================
+seqio.SentencePieceVocabulary.sentencepiece_model_file = \
+    'gs://t5-dutch-english/vocabs/nedd.32000.128extra/spiece.model'
+# Parameters for network.T5Config:
+# ==============================================================================
+network.T5Config.dropout_rate = %DROPOUT_RATE
+network.T5Config.dtype = 'bfloat16'
+network.T5Config.emb_dim = 768
+network.T5Config.head_dim = 64
+network.T5Config.logits_via_embedding = False
+network.T5Config.mlp_activations = ('gelu', 'linear')
+network.T5Config.mlp_dim = 2048
+network.T5Config.num_decoder_layers = 12
+network.T5Config.num_encoder_layers = 12
+network.T5Config.num_heads = 12
+network.T5Config.vocab_size = 32128
+# Parameters for train_script.train:
+# ==============================================================================
+train_script.train.checkpoint_cfg = @utils.CheckpointConfig()
+train_script.train.eval_period = 2000
+train_script.train.eval_steps = 20
+train_script.train.infer_eval_dataset_cfg = None
+train_script.train.model = %MODEL
+train_script.train.model_dir = %MODEL_DIR
+train_script.train.partitioner = @partitioning.PjitPartitioner()
+train_script.train.random_seed = %RANDOM_SEED
+train_script.train.stats_period = 100
+train_script.train.summarize_config_fn = @gin_utils.summarize_gin_config
+train_script.train.total_steps = %TRAIN_STEPS
+train_script.train.train_dataset_cfg = @train/utils.DatasetConfig()
+train_script.train.train_eval_dataset_cfg = @train_eval/utils.DatasetConfig()
+train_script.train.trainer_cls = @trainer.Trainer
+train_script.train.use_hardware_rng = %USE_HARDWARE_RNG
+# Parameters for trainer.Trainer:
+# ==============================================================================
+trainer.Trainer.learning_rate_fn = @utils.create_learning_rate_scheduler()
+trainer.Trainer.num_microbatches = None
+# Parameters for network.Transformer:
+# ==============================================================================
+network.Transformer.config = @network.T5Config()

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_name_or_path": "yhavinga/ul2-base-dutch",
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "d_ff": 2048,
+  "d_kv": 64,
+  "d_model": 768,
+  "decoder_start_token_id": 0,
+  "dense_act_fn": "gelu_new",
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gated-gelu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "is_gated_act": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "num_decoder_layers": 12,
+  "num_heads": 12,
+  "num_layers": 12,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.23.1",
+  "use_cache": true,
+  "vocab_size": 32128
+}

flax_model.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2acad18589b451d7eafbd9333467f969021f3db52d29e2574586ffc204c93d0d
+size 990323615

model-info.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e23bf48ddab8aaebc54af60f39b15d6a89810dca3fef7f8373dada0f9b908eb
+size 990402637

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:caa6e2f21aeec181276ab80273e3f869ce303ccb8602d68e0524783c3581092d
+size 800223

spiece.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "name_or_path": "yhavinga/ul2-base-dutch",
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "special_tokens_map_file": null,
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>",
+  "use_fast_tokenizer": false
+}

train/events.out.tfevents.1669658385.t1v-n-a765f9c4-w-0.2471217.0.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd9260a030255fffa0109a19b21b8c1805097465b5d7b26b70e2f390978e7064
+size 19863940

training_eval/mc4_nl_ul2_denoising/events.out.tfevents.1669658385.t1v-n-a765f9c4-w-0.2471217.1.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b277063ac1c0bab22374b7e8b5905ff6ab620768a3813f9c8e2a1288bf99b64
+size 879457

training_eval/ul2_mc4_nedd_wiki_news_mix_1/events.out.tfevents.1669658385.t1v-n-a765f9c4-w-0.2471217.2.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c325c4083853127bffa804eaab3099490e3b486238bfbbd0e6348c00ddedb2f
+size 879457