File size: 1,826 Bytes
8d46fb3 5bee558 8d46fb3 f10a05f 8d46fb3 15282a7 f10a05f 15282a7 f10a05f 15282a7 f10a05f 404cef3 f10a05f 15282a7 f10a05f 15282a7 7c5a860 15282a7 7c5a860 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
interface: Off
license: mit
datasets:
- mwz/ur_para
language:
- ur
tags:
- 'paraphrase '
---
# Urdu Paraphrasing Model
This repository contains a trained Urdu paraphrasing model based on the BERT-based encoder-decoder architecture. The model has been fine-tuned on the Urdu Paraphrase Dataset and can generate paraphrases for given input sentences in Urdu.
## Model Description
The model is built using the Hugging Face Transformers library and is trained on the BERT-base-uncased model. It employs an encoder-decoder architecture where the BERT model serves as the encoder, and another BERT model is used as the decoder. The model is trained to generate paraphrases by reconstructing the input sentences.
## Usage
To use the trained model for paraphrasing Urdu sentences, you can follow the steps below:
1. Install the required dependencies by running the following command:
2. Load the trained model using the Hugging Face Transformers library:
```python
from transformers import EncoderDecoderModel, BertTokenizer
# Load the model and tokenizer
model = EncoderDecoderModel.from_pretrained("mwz/UrduParaphraseBERT")
tokenizer = BertTokenizer.from_pretrained("mwz/UrduParaphraseBERT")
def paraphrase_urdu_sentence(sentence):
input_ids = tokenizer.encode(sentence, padding="longest", truncation=True, max_length=512, return_tensors="pt")
generated_ids = model.generate(input_ids=input_ids, max_length=128, num_beams=4, no_repeat_ngram_size=2)
paraphrase = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
return paraphrase
sentence = "ایک مثالی روشنی کا مشہور نقطہ آبادی چھوٹی چھوٹی سڑکوں میں اپنے آپ کو خوشگوار کرسکتی ہے۔"
paraphrased_sentence = paraphrase_urdu_sentence(sentence)
print(paraphrased_sentence)
```
|