A Swedish Bert model
Model description
This model follows the Bert Large model architecture as implemented in Megatron-LM framework. It was trained with a batch size of 512 in 600k steps. The model contains following parameters:
Hyperparameter | Value |
---|---|
340M | |
24 | |
16 | |
1024 | |
30592 |
Training data
The model is pretrained on a Swedish text corpus of around 85 GB from a variety of sources as shown below.
Dataset | Genre | Size(GB) |
---|---|---|
Anföranden | Politics | 0.9 |
DCEP | Politics | 0.6 |
DGT | Politics | 0.7 |
Fass | Medical | 0.6 |
Författningar | Legal | 0.1 |
Web data | Misc | 45.0 |
JRC | Legal | 0.4 |
Litteraturbanken | Books | 0.3O |
SCAR | Misc | 28.0 |
SOU | Politics | 5.3 |
Subtitles | Drama | 1.3 |
Wikipedia | Facts | 1.8 |
Intended uses & limitations
The raw model can be used for the usual tasks of masked language modeling or next sentence prediction. It is also often fine-tuned on a downstream task to improve its performance in a specific domain/task.
How to use
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("AI-Nordics/bert-large-swedish-cased")
model = AutoModelForMaskedLM.from_pretrained("AI-Nordics/bert-large-swedish-cased")
- Downloads last month
- 148
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.