license: mit
datasets:
- mwz/ur_para
language:
- ur
tags:
- 'paraphrase '
pipeline_tag: text2text-generation
Urdu Paraphrasing Model
This repository contains a pretrained model for Urdu paraphrasing. The model is based on the BERT architecture and has been fine-tuned on a large dataset of Urdu paraphrases.
Model Description
The pretrained model is based on the BERT architecture, specifically designed for paraphrasing tasks in the Urdu language. It has been trained using a large corpus of Urdu text to generate high-quality paraphrases.
Model Details
- Model Name: Urdu-Paraphrasing-BERT
- Base Model: BERT
- Architecture: Transformer
- Language: Urdu
- Dataset: Urdu Paraphrasing Dataset mwz/ur_para
How to Use
You can use this pretrained model for generating paraphrases for Urdu text. Here's an example of how to use the model:
from transformers import pipeline
# Load the model
model = pipeline("text2text-generation", model="path_to_pretrained_model")
# Generate paraphrases
input_text = "Urdu input text for paraphrasing."
paraphrases = model(input_text, max_length=128, num_return_sequences=3)
# Print the generated paraphrases
print("Original Input Text:", input_text)
print("Generated Paraphrases:")
for paraphrase in paraphrases:
print(paraphrase["generated_text"])
Training
The model was trained using the Hugging Face transformers library. The training process involved fine-tuning the base BERT model on the Urdu Paraphrasing Dataset.
Evaluation
The model's performance was evaluated on a separate validation set using metrics such as BLEU, ROUGE, and perplexity. However, please note that the evaluation results may vary depending on the specific use case.
Acknowledgments
- The pretrained model is based on the BERT architecture developed by Google Research.
License
This model and the associated code are licensed under the MIT License.