--- license: mit datasets: - mwz/ur_para language: - ur tags: - 'paraphrase ' ---
Fine-tuned model for Urdu paraphrase generation
## Model Description The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the `roberta-urdu-small` architecture, fine-tuned for the specific task of generating high-quality paraphrases. ## Features - Generate accurate and contextually relevant paraphrases in Urdu. - Maintain linguistic nuances and syntactic structures of the original input. - Handle a variety of input sentence lengths and complexities. ## Usage ### Installation To use the Urdu Paraphrase Generation Model, follow these steps: 1. Install the `transformers` library: ```bash pip install transformers ``` 2. Load the model and tokenizer in your Python script: ``` from transformers import AutoModelForMaskedLM, AutoTokenizer # Load the model and tokenizer model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small") tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small") ``` ## Generating Paraphrases Use the following code snippet to generate paraphrases with the loaded model and tokenizer: ``` # Example sentence input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔" # Tokenize the input sentence inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt") input_ids = inputs.input_ids.to(model.device) attention_mask = inputs.attention_mask.to(model.device) # Generate paraphrase with torch.no_grad(): outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128) paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True) print("Paraphrase:", paraphrase) ``` ## Performance The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon. ## Contributing Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request. ## License This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.