|
--- |
|
language: |
|
- sv |
|
|
|
--- |
|
|
|
# megatron.bert-large.wordpiece-32k-pretok.25k-steps |
|
|
|
This BERT model was trained using the NeMo library. |
|
The size of the model is a regular bert-large. |
|
The model was trained on more than 245GB of data, consisting mostly of web-data and Swedish newspaper text curated by the National Library of Sweden. |
|
|
|
Training was done for 25k training steps using a batch size of 8k. |
|
|
|
The model has multiple sibling models trained on the same dataset using different tokenizers or more/less parameters: |
|
- [megatron.bert-base.bpe-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.bpe-32k-no_pretok.25k-steps) |
|
- [megatron.bert-base.bpe-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.bpe-64k-no_pretok.25k-steps) |
|
- [megatron.bert-base.spe-bpe-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-32k-no_pretok.25k-steps) |
|
- [megatron.bert-base.spe-bpe-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-32k-pretok.25k-steps) |
|
- [megatron.bert-base.spe-bpe-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-64k-no_pretok.25k-steps) |
|
- [megatron.bert-base.spe-bpe-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.spe-bpe-64k-pretok.25k-steps) |
|
- [megatron.bert-base.unigram-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-32k-no_pretok.25k-steps) |
|
- [megatron.bert-base.unigram-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-32k-pretok.25k-steps) |
|
- [megatron.bert-base.unigram-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-64k-no_pretok.25k-steps) |
|
- [megatron.bert-base.unigram-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.unigram-64k-pretok.25k-steps) |
|
- [megatron.bert-base.wordpiece-32k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-32k-no_pretok.25k-steps) |
|
- [megatron.bert-base.wordpiece-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-32k-pretok.25k-steps) |
|
- [megatron.bert-base.wordpiece-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-64k-no_pretok.25k-steps) |
|
- [megatron.bert-base.wordpiece-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-base.wordpiece-64k-pretok.25k-steps) |
|
- [megatron.bert-large.bpe-64k-no_pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.bpe-64k-no_pretok.25k-steps) |
|
- [megatron.bert-large.spe-bpe-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.spe-bpe-32k-pretok.25k-steps) |
|
- [megatron.bert-large.unigram-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.unigram-32k-pretok.25k-steps) |
|
- [megatron.bert-large.unigram-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.unigram-64k-pretok.25k-steps) |
|
- [megatron.bert-large.wordpiece-32k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.wordpiece-32k-pretok.25k-steps) |
|
- [megatron.bert-large.wordpiece-64k-pretok.25k-steps](https://huggingface.co/KBLab/megatron.bert-large.wordpiece-64k-pretok.25k-steps) |
|
|
|
|
|
## Acknowledgements |
|
|
|
The training was performed on the Luxembourg national supercomputer MeluXina. |
|
The authors gratefully acknowledge the LuxProvide teams for their expert support. |
|
|
|
|