mwz
/

UrduParaphraseBERT

Text2Text Generation

encoder-decoder

Model card Files Files and versions Community

UrduParaphraseBERT / README.md

mwz's picture

mwz

Update README.md

3c6db97 over 1 year ago

|

2.51 kB

	---
	inference: false
	license: mit
	datasets:
	- mwz/ur_para
	language:
	- ur
	tags:
	- 'paraphrase '
	---
	<h1 align="center">Urdu Paraphrase Generation Model</h1>

	<p align="center">
	<b>Fine-tuned model for Urdu paraphrase generation</b>
	</p>

	## Model Description

	The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the `roberta-urdu-small` architecture, fine-tuned for the specific task of generating high-quality paraphrases.

	## Features

	- Generate accurate and contextually relevant paraphrases in Urdu.
	- Maintain linguistic nuances and syntactic structures of the original input.
	- Handle a variety of input sentence lengths and complexities.

	## Usage

	### Installation

	To use the Urdu Paraphrase Generation Model, follow these steps:

	1. Install the `transformers` library:
	```bash
	pip install transformers
	```
	2. Load the model and tokenizer in your Python script:
	```
	from transformers import AutoModelForMaskedLM, AutoTokenizer

	# Load the model and tokenizer
	model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
	tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")
	```
	## Generating Paraphrases
	Use the following code snippet to generate paraphrases with the loaded model and tokenizer:
	```
	# Example sentence
	input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔"

	# Tokenize the input sentence
	inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt")
	input_ids = inputs.input_ids.to(model.device)
	attention_mask = inputs.attention_mask.to(model.device)

	# Generate paraphrase
	with torch.no_grad():
	outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128)

	paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print("Paraphrase:", paraphrase)
	```

	## Performance
	The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon.

	## Contributing
	Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

	## License

	This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.