Soraberta: Unsupervised Language Model Pre-training for Wolof

bert-base-wolof is pretrained bert-base model on wolof language .

Soraberta models

Model name	Number of layers	Attention Heads	Embedding Dimension	Total Parameters
`bert-base`	6	12	514	56931622 M

Using Soraberta with Hugging Face's Transformers

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='abdouaziiz/bert-base-wolof')
>>> unmasker("kuy yoot du [MASK].")

[{'sequence': '[CLS] kuy yoot du seqet. [SEP]',
  'score': 0.09505125880241394,
  'token': 13578},
 {'sequence': '[CLS] kuy yoot du daw. [SEP]',
  'score': 0.08882280439138412,
  'token': 679},
 {'sequence': '[CLS] kuy yoot du yoot. [SEP]',
  'score': 0.057790059596300125,
  'token': 5117},
 {'sequence': '[CLS] kuy yoot du seqat. [SEP]',
  'score': 0.05671025067567825,
  'token': 4992},
 {'sequence': '[CLS] kuy yoot du yaqu. [SEP]',
  'score': 0.0469999685883522,
  'token': 1735}]

Training data

The data sources are Bible OT , WOLOF-ONLINE ALFFA_PUBLIC

Contact

Please contact [email protected] for any question, feedback or request.