Khasi Fill-Mask Model

This project demonstrates how to use the Hugging Face Transformers library to perform a fill-mask task using the jefson08/kha-roberta model. The fill-mask task predicts the most likely token(s) to replace the [MASK] token in a given sentence.

Usage

1. Import Dependencies

from transformers import pipeline, AutoTokenizer

2. Initialize the Model and Tokenizer

Load the tokenizer and model pipeline:

# Initialisation
tokenizer = AutoTokenizer.from_pretrained('jefson08/kha-roberta')
fill_mask = pipeline(
    "fill-mask",
    model="jefson08/kha-roberta",
    tokenizer=tokenizer,
    device="cuda",  # Use "cuda" for GPU or omit for CPU
)

3. Predict the [MASK] Token

Provide a sentence with a [MASK] token for prediction:

# Predict [MASK] token
sentence = "Nga dei u briew u ba [MASK] bha."
predictions = fill_mask(sentence)

# Display predictions
for prediction in predictions:
    print(f"{prediction['sequence']} (score: {prediction['score']:.4f})")

Example Output

Given the input sentence:

"Nga dei u briew u ba [MASK] bha."

The model might output:

[{'score': 0.09230164438486099,
  'token': 6086,
  'token_str': 'mutlop',
  'sequence': 'Nga dei u briew u ba  mutlop bha.'},
 {'score': 0.051360130310058594,
  'token': 2059,
  'token_str': 'stad',
  'sequence': 'Nga dei u briew u ba  stad bha.'},
 {'score': 0.045497000217437744,
  'token': 1864,
  'token_str': 'khuid',
  'sequence': 'Nga dei u briew u ba  khuid bha.'},
 {'score': 0.04180142655968666,
  'token': 668,
  'token_str': 'kham',
  'sequence': 'Nga dei u briew u ba  kham bha.'},
 {'score': 0.027332570403814316,
  'token': 2817,
  'token_str': 'khlaiñ',
  'sequence': 'Nga dei u briew u ba  khlaiñ bha.'}]

Model Information

The jefson08/kha-roberta model is fine-tuned for Khasi text tasks. It uses the fill-mask pipeline to predict and replace [MASK] tokens in sentences, providing insights into contextual language understanding.

Dependencies

Transformers: Provides the pipeline and model-loading utilities.
PyTorch: Backend framework for running the model.

Install the dependencies with:

pip install transformers torch

Acknowledgements

Hugging Face Transformers library.
Model by N Donald Jefferson Thabah.

License

This project is licensed under the MIT License. See the LICENSE file for more details.