Update README.md
Browse files
README.md
CHANGED
@@ -65,7 +65,7 @@ All the results are the mean of two seperate experiments with the same hyper-par
|
|
65 |
|
66 |
| Training and Evaluation Data | Description |
|
67 |
| ----------- | ----------- |
|
68 |
-
| Datasets | English Wikipedia
|
69 |
| Motivation | To build an efficient and accurate model for the question answering task. |
|
70 |
| Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |
|
71 |
|
|
|
65 |
|
66 |
| Training and Evaluation Data | Description |
|
67 |
| ----------- | ----------- |
|
68 |
+
| Datasets | [English Wikipedia Dataset](https://huggingface.co/datasets/wikipedia) (2500M words). |
|
69 |
| Motivation | To build an efficient and accurate model for the question answering task. |
|
70 |
| Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |
|
71 |
|