UrduParaphraseBERT / README.md
mwz's picture
Update README.md
7d20510
|
raw
history blame
2.49 kB
---
license: mit
datasets:
- mwz/ur_para
language:
- ur
tags:
- 'paraphrase '
---
<h1 align="center">Urdu Paraphrase Generation Model</h1>
<p align="center">
<b>Fine-tuned model for Urdu paraphrase generation</b>
</p>
## Model Description
The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the `roberta-urdu-small` architecture, fine-tuned for the specific task of generating high-quality paraphrases.
## Features
- Generate accurate and contextually relevant paraphrases in Urdu.
- Maintain linguistic nuances and syntactic structures of the original input.
- Handle a variety of input sentence lengths and complexities.
## Usage
### Installation
To use the Urdu Paraphrase Generation Model, follow these steps:
1. Install the `transformers` library:
```bash
pip install transformers
```
2. Load the model and tokenizer in your Python script:
```
from transformers import AutoModelForMaskedLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")
```
## Generating Paraphrases
Use the following code snippet to generate paraphrases with the loaded model and tokenizer:
```
# Example sentence
input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔"
# Tokenize the input sentence
inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt")
input_ids = inputs.input_ids.to(model.device)
attention_mask = inputs.attention_mask.to(model.device)
# Generate paraphrase
with torch.no_grad():
outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128)
paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Paraphrase:", paraphrase)
```
## Performance
The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon.
## Contributing
Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.