bconsolvo commited on
Commit
34fd491
·
1 Parent(s): 9312cfb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -3,14 +3,15 @@ language: en
3
  license: apache-2.0
4
  tags:
5
  - fill-mask
 
6
  datasets:
7
  - wikipedia
8
  - bookcorpus
9
  ---
10
- ## Model Details: 90% Sparse BERT-Base (uncased) Prune Once For All
11
- This model is a sparse pre-trained model that can be fine-tuned for a wide range of language tasks. The process of weight pruning is forcing some of the weights of the neural network to zero. Setting some of the neural network's weights to zero results in sparse matrices. Updating neural network weights does involve matrix multiplication, and if we can keep the matrices sparse while retaining enough important information, we can reduce the overall computation overhead. The "sparse" in the title of the model indicates a ratio of sparsity in the weights; for more details, you can read [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754).
12
 
13
- Visualization of Prunce Once For All method from [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754):
14
  ![Zafrir2021_Fig1.png](https://s3.amazonaws.com/moonup/production/uploads/6297f0e30bd2f58c647abb1d/nSDP62H9NHC1FA0C429Xo.png)
15
 
16
  | Model Detail | Description |
@@ -26,7 +27,7 @@ Visualization of Prunce Once For All method from [Zafrir et al. (2021)](https://
26
 
27
  | Intended Use | Description |
28
  | ----------- | ----------- |
29
- | Primary intended uses | This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) SQuADv1.1, QNLI, MNLI, SST-2 and QQP. |
30
  | Primary intended users | Anyone who needs an efficient general language model for other downstream tasks. |
31
  | Out-of-scope uses | The model should not be used to intentionally create hostile or alienating environments for people.|
32
 
@@ -66,7 +67,7 @@ All the results are the mean of two seperate experiments with the same hyper-par
66
  | Training and Evaluation Data | Description |
67
  | ----------- | ----------- |
68
  | Datasets | [English Wikipedia Dataset](https://huggingface.co/datasets/wikipedia) (2500M words). |
69
- | Motivation | To build an efficient and accurate model for the question answering task. |
70
  | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |
71
 
72
  | Ethical Considerations | Description |
 
3
  license: apache-2.0
4
  tags:
5
  - fill-mask
6
+ - bert
7
  datasets:
8
  - wikipedia
9
  - bookcorpus
10
  ---
11
+ ## Model Details: 90% Sparse BERT-Base (uncased) Prune Once for All
12
+ This model is a sparse pre-trained model that can be fine-tuned for a wide range of language tasks. The process of weight pruning is forcing some of the weights of the neural network to zero. Setting some of the weights to zero results in sparser matrices. Updating neural network weights does involve matrix multiplication, and if we can keep the matrices sparse while retaining enough important information, we can reduce the overall computational overhead. The term "sparse" in the title of the model indicates a ratio of sparsity in the weights; for more details, you can read [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754).
13
 
14
+ Visualization of Prunce Once for All method from [Zafrir et al. (2021)](https://arxiv.org/abs/2111.05754):
15
  ![Zafrir2021_Fig1.png](https://s3.amazonaws.com/moonup/production/uploads/6297f0e30bd2f58c647abb1d/nSDP62H9NHC1FA0C429Xo.png)
16
 
17
  | Model Detail | Description |
 
27
 
28
  | Intended Use | Description |
29
  | ----------- | ----------- |
30
+ | Primary intended uses | This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) question-answering, genre natural language inference, and sentiment classification. |
31
  | Primary intended users | Anyone who needs an efficient general language model for other downstream tasks. |
32
  | Out-of-scope uses | The model should not be used to intentionally create hostile or alienating environments for people.|
33
 
 
67
  | Training and Evaluation Data | Description |
68
  | ----------- | ----------- |
69
  | Datasets | [English Wikipedia Dataset](https://huggingface.co/datasets/wikipedia) (2500M words). |
70
+ | Motivation | To build an efficient and accurate base model for several downstream language tasks. |
71
  | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers ([Devlin et al., 2019](https://arxiv.org/abs/1810.04805), [Sanh et al., 2019](https://arxiv.org/abs/1910.01108)). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." |
72
 
73
  | Ethical Considerations | Description |