--- language: - 'no' - nb - nn inference: false tags: - BERT - NorBERT - Norwegian - encoder license: cc-by-4.0 --- # NorBERT 3 base The official release of a new generation of NorBERT language models described in paper [**NorBench — A Benchmark for Norwegian Language Models**](https://openreview.net/forum?id=WgxNONkAbz). Plese read the paper to learn more details about the model. ## Other sizes: - [NorBERT 3 xs (15M)](https://huggingface.co/ltg/norbert3-xs) - [NorBERT 3 small (40M)](https://huggingface.co/ltg/norbert3-small) - [NorBERT 3 base (123M)](https://huggingface.co/ltg/norbert3-base) - [NorBERT 3 large (323M)](https://huggingface.co/ltg/norbert3-large) ## Generative NorT5 siblings: - [NorT5 xs (15M)](https://huggingface.co/ltg/nort5-xs) - [NorT5 small (40M)](https://huggingface.co/ltg/nort5-small) - [NorT5 base (123M)](https://huggingface.co/ltg/nort5-base) - [NorT5 large (323M)](https://huggingface.co/ltg/nort5-large) ## Example usage This model currently needs a custom wrapper from `modeling_norbert.py`. Then you can use it like this: ```python import torch from transformers import AutoTokenizer from modeling_norbert import NorbertForMaskedLM tokenizer = AutoTokenizer.from_pretrained("path/to/folder") bert = NorbertForMaskedLM.from_pretrained("path/to/folder") mask_id = tokenizer.convert_tokens_to_ids("[MASK]") input_text = tokenizer("Nå ønsker de seg en[MASK] bolig.", return_tensors="pt") output_p = bert(**input_text) output_text = torch.where(input_text.input_ids == mask_id, output_p.logits.argmax(-1), input_text.input_ids) # should output: '[CLS] Nå ønsker de seg en ny bolig.[SEP]' print(tokenizer.decode(output_text[0].tolist())) ``` The following classes are currently implemented: `NorbertForMaskedLM`, `NorbertForSequenceClassification`, `NorbertForTokenClassification`, `NorbertForQuestionAnswering` and `NorbertForMultipleChoice`. ## Cite us ```bibtex @inproceedings{ samuel2023norbench, title={NorBench -- A Benchmark for Norwegian Language Models}, author={David Samuel and Andrey Kutuzov and Samia Touileb and Erik Velldal and Lilja {\O}vrelid and Egil R{\o}nningstad and Elina Sigdel and Anna Sergeevna Palatkina}, booktitle={The 24rd Nordic Conference on Computational Linguistics}, year={2023}, url={https://openreview.net/forum?id=WgxNONkAbz} } ```