UrduParaphraseBERT / README.md
mwz's picture
Update README.md
7d20510
|
raw
history blame
2.49 kB
metadata
license: mit
datasets:
  - mwz/ur_para
language:
  - ur
tags:
  - 'paraphrase '

Urdu Paraphrase Generation Model

Fine-tuned model for Urdu paraphrase generation

Model Description

The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the roberta-urdu-small architecture, fine-tuned for the specific task of generating high-quality paraphrases.

Features

  • Generate accurate and contextually relevant paraphrases in Urdu.
  • Maintain linguistic nuances and syntactic structures of the original input.
  • Handle a variety of input sentence lengths and complexities.

Usage

Installation

To use the Urdu Paraphrase Generation Model, follow these steps:

  1. Install the transformers library:
pip install transformers
  1. Load the model and tokenizer in your Python script:
from transformers import AutoModelForMaskedLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")

Generating Paraphrases

Use the following code snippet to generate paraphrases with the loaded model and tokenizer:

# Example sentence
input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔"

# Tokenize the input sentence
inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt")
input_ids = inputs.input_ids.to(model.device)
attention_mask = inputs.attention_mask.to(model.device)

# Generate paraphrase
with torch.no_grad():
    outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128)

paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Paraphrase:", paraphrase)

Performance

The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon.

Contributing

Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.