--- title: README emoji: 📊 colorFrom: blue colorTo: purple sdk: static pinned: false --- # 🏡 TensorFlow Model Garden LMs This organization showcases language model pretraining with the awesome [TensorFlow Model Garden](https://github.com/tensorflow/models) library. The following LMs are currently supported: * [BERT Pretraining](https://aclanthology.org/N19-1423/) - see [pretraining instructions](https://github.com/stefan-it/model-garden-lms/tree/main/bert) * [Token Dropping for efficient BERT Pretraining](https://aclanthology.org/2022.acl-long.262/) - see [pretraining instructions](https://github.com/stefan-it/model-garden-lms/tree/main/token-dropping-bert) * [Training ELECTRA Augmented with Multi-word Selection](https://aclanthology.org/2021.findings-acl.219/) (TEAMS) - see [pretraining instructions](https://github.com/stefan-it/model-garden-lms/tree/main/teams) # 🍷 FineWeb-LMs Following LMs were pretrained on the (10BT subset) of the famous [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset: * BERT-based - find the [best model checkpoint here](https://huggingface.co/model-garden-lms/bert-base-finewebs-951k) * Token Dropping BERT-based - find the [best model checkpoint here](https://huggingface.co/model-garden-lms/bert-base-token-dropping-finewebs-901k) * TEAMS-based - fine the [best model checkpoint here](https://huggingface.co/model-garden-lms/teams-base-finewebs-1m) # 📊 ScandEval Evaluation To find the best checkpoints and compare our FineWeb-LMs to other models (BERT, ELECTRA and RoBERTa) we perform an evaluation using the great [ScandEval](https://github.com/ScandEval/ScandEval) library. | Model ID | Avg. Score | CoNLL-En | SST5 | ScaLA-En | SQuAD | |-------------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------| | [model-garden-lms/bert-base-finewebs-951k](https://huggingface.co/model-garden-lms/bert-base-finewebs-951k) | 69.41 | 89.25 ± 0.4 / 88.9 ± 0.37 | 58.17 ± 1.26 / 59.86 ± 1.65 | 58.83 ± 3.46 / 78.22 ± 2.11 | 55.66 ± 1.19 / 66.36 ± 1.42 | | [model-garden-lms/bert-base-token-dropping-finewebs-901k](https://huggingface.co/model-garden-lms/bert-base-token-dropping-finewebs-901k) | 68.01 | 88.98 ± 0.64 / 88.67 ± 0.55 | 57.79 ± 1.31 / 58.91 ± 1.85 | 54.25 ± 6.3 / 75.73 ± 3.54 | 54.4 ± 0.72 / 65.31 ± 1.01 | | [model-garden-lms/teams-base-finewebs-1m](https://huggingface.co/model-garden-lms/teams-base-finewebs-1m) | **72.64** | 89.27 ± 0.41 / 88.82 ± 0.41 | 59.58 ± 0.64 / 62.63 ± 3.0 | 66.72 ± 0.94 / 83.01 ± 0.45 | 59.95 ± 0.71 / 71.13 ± 0.58 | | [google-bert/bert-base-cased](https://huggingface.co/google-bert/bert-base-cased) | 62.26 | 87.39 ± 0.79 / 87.11 ± 0.66 | 54.49 ± 1.36 / 53.22 ± 1.15 | 52.08 ± 2.13 / 74.52 ± 1.31 | 38.63 ± 2.1 / 50.68 ± 1.87 | | [google/electra-base-discriminator](https://huggingface.co/google/electra-base-discriminator) | 69.26 | 87.82 ± 0.69 / 86.83 ± 0.62 | 62.3 ± 1.12 / 55.93 ± 0.67 | 62.61 ± 1.21 / 80.85 ± 0.59 | 52.51 ± 0.86 / 65.2 ± 0.85 | | [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) | 68.96 | 90.35 ± 0.23 / 90.14 ± 0.2 | 60.95 ± 1.4 / 57.52 ± 1.97 | 50.64 ± 1.69 / 74.55 ± 0.9 | 57.82 ± 1.35 / 69.68 ± 1.02 | The TEAMS model outperforms RoBERTa and ELECTRA, which were trained on much more data and pretraining steps. All detailed results can be found in [this](https://huggingface.co/datasets/model-garden-lms/finewebs-scandeval-results) dataset repository. # ❤️ Acknowledgements This repository is the outcome of the last two years of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/) and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library. Made from Bavarian Oberland with ❤️ and 🥨.