inference: false
license: mit
datasets:
- mwz/ur_para
language:
- ur
tags:
- 'paraphrase '
Urdu Paraphrase Generation Model
Fine-tuned model for Urdu paraphrase generation
Model Description
The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the roberta-urdu-small
architecture, fine-tuned for the specific task of generating high-quality paraphrases.
Features
- Generate accurate and contextually relevant paraphrases in Urdu.
- Maintain linguistic nuances and syntactic structures of the original input.
- Handle a variety of input sentence lengths and complexities.
Usage
Installation
To use the Urdu Paraphrase Generation Model, follow these steps:
- Install the
transformers
library:
pip install transformers
- Load the model and tokenizer in your Python script:
from transformers import AutoModelForMaskedLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")
Generating Paraphrases
Use the following code snippet to generate paraphrases with the loaded model and tokenizer:
# Example sentence
input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔"
# Tokenize the input sentence
inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt")
input_ids = inputs.input_ids.to(model.device)
attention_mask = inputs.attention_mask.to(model.device)
# Generate paraphrase
with torch.no_grad():
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128)
paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Paraphrase:", paraphrase)
Performance
The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon.
Contributing
Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.