gaunernst
/

bert-L8-H256-uncased

Inference Endpoints

Model card Files Files and versions Community

gaunernst commited on Dec 2, 2023

Commit

98e3346

•

1 Parent(s): a6a5c5a

Update README.md

Files changed (1) hide show

README.md +64 -0

README.md CHANGED Viewed

@@ -1,3 +1,67 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- bookcorpus
+- wikipedia
+language:
+- en
 ---
+# BERT L8-H256 (uncased)
+Mini BERT models from https://arxiv.org/abs/1908.08962 that the HF team didn't convert. The original [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) is used.
+See the original Google repo: [google-research/bert](https://github.com/google-research/bert)
+Note: it's not clear if these checkpoints have undergone knowledge distillation.
+## Model variants
+|   |H=128|H=256|H=512|H=768|
+|---|:---:|:---:|:---:|:---:|
+| **L=2**  |[2/128 (BERT-Tiny)][2_128]|[2/256][2_256]|[2/512][2_512]|[2/768][2_768]|
+| **L=4**  |[4/128][4_128]|[4/256 (BERT-Mini)][4_256]|[4/512 (BERT-Small)][4_512]|[4/768][4_768]|
+| **L=6**  |[6/128][6_128]|[6/256][6_256]|[6/512][6_512]|[6/768][6_768]|
+| **L=8**  |[8/128][8_128]|[**8/256**][8_256]|[8/512 (BERT-Medium)][8_512]|[8/768][8_768]|
+| **L=10** |[10/128][10_128]|[10/256][10_256]|[10/512][10_512]|[10/768][10_768]|
+| **L=12** |[12/128][12_128]|[12/256][12_256]|[12/512][12_512]|[12/768 (BERT-Base, original)][12_768]|
+[2_128]: https://huggingface.co/gaunernst/bert-tiny-uncased
+[2_256]: https://huggingface.co/gaunernst/bert-L2-H256-uncased
+[2_512]: https://huggingface.co/gaunernst/bert-L2-H512-uncased
+[2_768]: https://huggingface.co/gaunernst/bert-L2-H768-uncased
+[4_128]: https://huggingface.co/gaunernst/bert-L4-H128-uncased
+[4_256]: https://huggingface.co/gaunernst/bert-mini-uncased
+[4_512]: https://huggingface.co/gaunernst/bert-small-uncased
+[4_768]: https://huggingface.co/gaunernst/bert-L4-H768-uncased
+[6_128]: https://huggingface.co/gaunernst/bert-L6-H128-uncased
+[6_256]: https://huggingface.co/gaunernst/bert-L6-H256-uncased
+[6_512]: https://huggingface.co/gaunernst/bert-L6-H512-uncased
+[6_768]: https://huggingface.co/gaunernst/bert-L6-H768-uncased
+[8_128]: https://huggingface.co/gaunernst/bert-L8-H128-uncased
+[8_256]: https://huggingface.co/gaunernst/bert-L8-H256-uncased
+[8_512]: https://huggingface.co/gaunernst/bert-medium-uncased
+[8_768]: https://huggingface.co/gaunernst/bert-L8-H768-uncased
+[10_128]: https://huggingface.co/gaunernst/bert-L10-H128-uncased
+[10_256]: https://huggingface.co/gaunernst/bert-L10-H256-uncased
+[10_512]: https://huggingface.co/gaunernst/bert-L10-H512-uncased
+[10_768]: https://huggingface.co/gaunernst/bert-L10-H768-uncased
+[12_128]: https://huggingface.co/gaunernst/bert-L12-H128-uncased
+[12_256]: https://huggingface.co/gaunernst/bert-L12-H256-uncased
+[12_512]: https://huggingface.co/gaunernst/bert-L12-H512-uncased
+[12_768]: https://huggingface.co/bert-base-uncased
+## Usage
+See other BERT model cards e.g. https://huggingface.co/bert-base-uncased
+## Citation
+```bibtex
+@article{turc2019,
+  title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
+  author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
+  journal={arXiv preprint arXiv:1908.08962v2 },
+  year={2019}
+}
+```