Intel
/

bert-base-uncased-sparse-90-unstructured-pruneofa

Inference Endpoints

Model card Files Files and versions Community

bconsolvo commited on Apr 11, 2023

Commit

2c15921

·

1 Parent(s): d6fe490

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -65,7 +65,7 @@ All the results are the mean of two seperate experiments with the same hyper-par
 | Training and Evaluation Data | Description |
 | ----------- | ----------- |
-| Datasets | English Wikipedia dataset (2500M words). |
 | Motivation | To build an efficient and accurate model for the question answering task. |
 | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |

 | Training and Evaluation Data | Description |
 | ----------- | ----------- |
+| Datasets | [English Wikipedia Dataset](https://huggingface.co/datasets/wikipedia) (2500M words). |
 | Motivation | To build an efficient and accurate model for the question answering task. |
 | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |