|
--- |
|
inference: false |
|
license: mit |
|
datasets: |
|
- mwz/ur_para |
|
language: |
|
- ur |
|
tags: |
|
- 'paraphrase ' |
|
--- |
|
<h1 align="center">Urdu Paraphrase Generation Model</h1> |
|
|
|
<p align="center"> |
|
<b>Fine-tuned model for Urdu paraphrase generation</b> |
|
</p> |
|
|
|
## Model Description |
|
|
|
The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the `roberta-urdu-small` architecture, fine-tuned for the specific task of generating high-quality paraphrases. |
|
|
|
## Features |
|
|
|
- Generate accurate and contextually relevant paraphrases in Urdu. |
|
- Maintain linguistic nuances and syntactic structures of the original input. |
|
- Handle a variety of input sentence lengths and complexities. |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
To use the Urdu Paraphrase Generation Model, follow these steps: |
|
|
|
1. Install the `transformers` library: |
|
```bash |
|
pip install transformers |
|
``` |
|
2. Load the model and tokenizer in your Python script: |
|
``` |
|
from transformers import AutoModelForMaskedLM, AutoTokenizer |
|
|
|
# Load the model and tokenizer |
|
model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small") |
|
tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small") |
|
``` |
|
## Generating Paraphrases |
|
Use the following code snippet to generate paraphrases with the loaded model and tokenizer: |
|
``` |
|
# Example sentence |
|
input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔" |
|
|
|
# Tokenize the input sentence |
|
inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt") |
|
input_ids = inputs.input_ids.to(model.device) |
|
attention_mask = inputs.attention_mask.to(model.device) |
|
|
|
# Generate paraphrase |
|
with torch.no_grad(): |
|
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128) |
|
|
|
paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print("Paraphrase:", paraphrase) |
|
``` |
|
|
|
## Performance |
|
The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon. |
|
|
|
## Contributing |
|
Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. |