File size: 2,506 Bytes
8d46fb3 3c6db97 8d46fb3 f10a05f 8d46fb3 404cef3 f10a05f 404cef3 f10a05f 7d20510 f10a05f 404cef3 f10a05f 404cef3 f10a05f 404cef3 f10a05f 404cef3 f10a05f 404cef3 f10a05f 404cef3 6be0509 404cef3 7c5a860 b7e7f67 7c5a860 b7e7f67 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
inference: false
license: mit
datasets:
- mwz/ur_para
language:
- ur
tags:
- 'paraphrase '
---
<h1 align="center">Urdu Paraphrase Generation Model</h1>
<p align="center">
<b>Fine-tuned model for Urdu paraphrase generation</b>
</p>
## Model Description
The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the `roberta-urdu-small` architecture, fine-tuned for the specific task of generating high-quality paraphrases.
## Features
- Generate accurate and contextually relevant paraphrases in Urdu.
- Maintain linguistic nuances and syntactic structures of the original input.
- Handle a variety of input sentence lengths and complexities.
## Usage
### Installation
To use the Urdu Paraphrase Generation Model, follow these steps:
1. Install the `transformers` library:
```bash
pip install transformers
```
2. Load the model and tokenizer in your Python script:
```
from transformers import AutoModelForMaskedLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")
```
## Generating Paraphrases
Use the following code snippet to generate paraphrases with the loaded model and tokenizer:
```
# Example sentence
input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔"
# Tokenize the input sentence
inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt")
input_ids = inputs.input_ids.to(model.device)
attention_mask = inputs.attention_mask.to(model.device)
# Generate paraphrase
with torch.no_grad():
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128)
paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Paraphrase:", paraphrase)
```
## Performance
The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon.
## Contributing
Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. |