barbaroo/nllb_200_600M_en_fo

Model Description

Model Architecture: This model is based on the NLLB 600M architecture and weights.
Languages: This checkpoint is fine-tuned to translate from English (en) to Faroese (fo).
Size: ~600M parameters.
Finetuning Datasets:
- Sprotin_parallel
- fo_en_synthetic
- Training Regime: Trained until convergence (about 2 epochs).
License: Inherits the original licenses of the NLLB 600M model.

Intended Use

Primary Use Case: Translate text from English to Faroese.
Audience: Researchers, developers, or anyone interested in Faroese language processing.
Usage Scenarios:
- Building Faroese-English translation tools
- Language research and corpus analysis
- Synthetic data creation

Important: While the model can produce fluent translations, it is not guaranteed to be perfectly accurate on all inputs. Users should verify critical or sensitive content through human experts.

Metrics

Model performance measures:
NLLB-200 model was evaluated using BLEU, chrF and BERT-score —metrics widely adopted by the machine translation community.

Evaluation Data

Datasets:
Flores-200 dataset is described in Section 4 of the NLLB paper/documentation.
Motivation:
Flores-200 is currently the only machine translation benchmark available for Faroese.

How to Use

Below is a simple usage example in Python with Hugging Face Transformers:

from transformers import pipeline

model_name = "barbaroo/nllb_200_600M_en_fo"

translator = pipeline(
    "translation",
    model=model_name,
    tokenizer=model_name,
    src_lang="eng_Latn",   # Language code for English
    tgt_lang="fao_Latn"    # Language code for Faroese
)

text = "Hello, how are you?"
translation = translator(text)
print(translation)

Citation

If you use this model or find it helpful in your research, please cite: [COMING SOON]

Contact

For questions, feedback, or collaboration inquiries, feel free to reach out:

Primary Contact: < Barbara Scalvini/ [email protected] / [email protected] >

barbaroo
/

nllb_200_600M_en_fo