Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,9 @@
|
|
1 |
This model was pretrained on the bookcorpus dataset using knowledge distillation.
|
2 |
|
3 |
The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads (hence the same head size of BERT).
|
|
|
4 |
The weights of the model were initialized by pruning the weights of bert-base-uncased.
|
|
|
5 |
A knowledge distillation was performed using multiple loss functions to fine-tune the model.
|
6 |
|
7 |
PS : the tokenizer is the same as the one of the model bert-base-uncased.
|
|
|
1 |
This model was pretrained on the bookcorpus dataset using knowledge distillation.
|
2 |
|
3 |
The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads (hence the same head size of BERT).
|
4 |
+
|
5 |
The weights of the model were initialized by pruning the weights of bert-base-uncased.
|
6 |
+
|
7 |
A knowledge distillation was performed using multiple loss functions to fine-tune the model.
|
8 |
|
9 |
PS : the tokenizer is the same as the one of the model bert-base-uncased.
|