File size: 2,489 Bytes
8d46fb3
 
f10a05f
 
 
 
 
 
8d46fb3
404cef3
f10a05f
404cef3
 
 
f10a05f
 
 
7d20510
f10a05f
404cef3
f10a05f
404cef3
 
 
f10a05f
404cef3
f10a05f
404cef3
f10a05f
404cef3
f10a05f
404cef3
6be0509
404cef3
7c5a860
 
 
 
 
 
 
 
 
 
 
 
 
b7e7f67
7c5a860
 
 
 
 
 
 
 
 
 
 
 
 
b7e7f67
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: mit
datasets:
- mwz/ur_para
language:
- ur
tags:
- 'paraphrase '
---
<h1 align="center">Urdu Paraphrase Generation Model</h1>

<p align="center">
  <b>Fine-tuned model for Urdu paraphrase generation</b>
</p>

## Model Description

The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the `roberta-urdu-small` architecture, fine-tuned for the specific task of generating high-quality paraphrases.

## Features

- Generate accurate and contextually relevant paraphrases in Urdu.
- Maintain linguistic nuances and syntactic structures of the original input.
- Handle a variety of input sentence lengths and complexities.

## Usage

### Installation

To use the Urdu Paraphrase Generation Model, follow these steps:

1. Install the `transformers` library:
```bash
pip install transformers
```
2. Load the model and tokenizer in your Python script:
```
from transformers import AutoModelForMaskedLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")
```
## Generating Paraphrases
Use the following code snippet to generate paraphrases with the loaded model and tokenizer:
```
# Example sentence
input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔"

# Tokenize the input sentence
inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt")
input_ids = inputs.input_ids.to(model.device)
attention_mask = inputs.attention_mask.to(model.device)

# Generate paraphrase
with torch.no_grad():
    outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128)

paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Paraphrase:", paraphrase)
```

## Performance
The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon.

## Contributing
Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.