bconsolvo commited on
Commit
2c15921
·
1 Parent(s): d6fe490

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -65,7 +65,7 @@ All the results are the mean of two seperate experiments with the same hyper-par
65
 
66
  | Training and Evaluation Data | Description |
67
  | ----------- | ----------- |
68
- | Datasets | English Wikipedia dataset (2500M words). |
69
  | Motivation | To build an efficient and accurate model for the question answering task. |
70
  | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |
71
 
 
65
 
66
  | Training and Evaluation Data | Description |
67
  | ----------- | ----------- |
68
+ | Datasets | [English Wikipedia Dataset](https://huggingface.co/datasets/wikipedia) (2500M words). |
69
  | Motivation | To build an efficient and accurate model for the question answering task. |
70
  | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |
71