mwz commited on
Commit
15282a7
1 Parent(s): 4877ad1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -52
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- inference: false
3
  license: mit
4
  datasets:
5
  - mwz/ur_para
@@ -8,65 +7,34 @@ language:
8
  tags:
9
  - 'paraphrase '
10
  ---
11
- <h1 align="center">Urdu Paraphrase Generation Model</h1>
12
 
13
- <p align="center">
14
- <b>Fine-tuned model for Urdu paraphrase generation</b>
15
- </p>
16
 
17
  ## Model Description
18
 
19
- The Urdu Paraphrase Generation Model is a language model trained on 30k rows dataset of Urdu paraphrases. It is based on the `roberta-urdu-small` architecture, fine-tuned for the specific task of generating high-quality paraphrases.
20
-
21
- ## Features
22
-
23
- - Generate accurate and contextually relevant paraphrases in Urdu.
24
- - Maintain linguistic nuances and syntactic structures of the original input.
25
- - Handle a variety of input sentence lengths and complexities.
26
 
27
  ## Usage
28
 
29
- ### Installation
30
 
31
- To use the Urdu Paraphrase Generation Model, follow these steps:
32
-
33
- 1. Install the `transformers` library:
34
- ```bash
35
- pip install transformers
36
- ```
37
- 2. Load the model and tokenizer in your Python script:
38
- ```
39
- from transformers import AutoModelForMaskedLM, AutoTokenizer
40
 
41
  # Load the model and tokenizer
42
- model = AutoModelForMaskedLM.from_pretrained("urduhack/roberta-urdu-small")
43
- tokenizer = AutoTokenizer.from_pretrained("urduhack/roberta-urdu-small")
44
- ```
45
- ## Generating Paraphrases
46
- Use the following code snippet to generate paraphrases with the loaded model and tokenizer:
47
- ```
48
- # Example sentence
49
- input_sentence = "تصوراتی طور پر کریم سکمنگ کی دو بنیادی جہتیں ہیں - مصنوعات اور جغرافیہ۔"
50
-
51
- # Tokenize the input sentence
52
- inputs = tokenizer(input_sentence, truncation=True, padding=True, return_tensors="pt")
53
- input_ids = inputs.input_ids.to(model.device)
54
- attention_mask = inputs.attention_mask.to(model.device)
55
-
56
- # Generate paraphrase
57
- with torch.no_grad():
58
- outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=128)
59
-
60
- paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)
61
- print("Paraphrase:", paraphrase)
62
  ```
63
-
64
- ## Performance
65
- The model has been fine-tuned on a 30k rows dataset of Urdu paraphrases and achieves impressive performance in generating high-quality paraphrases. Detailed performance metrics, such as accuracy and fluency, are being evaluated and will be updated soon.
66
-
67
- ## Contributing
68
- Contributions to the Urdu Paraphrase Generation Model are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
69
-
70
- ## License
71
-
72
- This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
 
1
  ---
 
2
  license: mit
3
  datasets:
4
  - mwz/ur_para
 
7
  tags:
8
  - 'paraphrase '
9
  ---
10
+ # Urdu Paraphrasing Model
11
 
12
+ This repository contains a trained Urdu paraphrasing model based on the BERT-based encoder-decoder architecture. The model has been fine-tuned on the Urdu Paraphrase Dataset and can generate paraphrases for given input sentences in Urdu.
 
 
13
 
14
  ## Model Description
15
 
16
+ The model is built using the Hugging Face Transformers library and is trained on the BERT-base-uncased model. It employs an encoder-decoder architecture where the BERT model serves as the encoder, and another BERT model is used as the decoder. The model is trained to generate paraphrases by reconstructing the input sentences.
 
 
 
 
 
 
17
 
18
  ## Usage
19
 
20
+ To use the trained model for paraphrasing Urdu sentences, you can follow the steps below:
21
 
22
+ 1. Install the required dependencies by running the following command:
23
+ 2. Load the trained model using the Hugging Face Transformers library:
24
+ ```python
25
+ from transformers import EncoderDecoderModel, BertTokenizer
 
 
 
 
 
26
 
27
  # Load the model and tokenizer
28
+ model = EncoderDecoderModel.from_pretrained("mwz/UrduParaphraseBERT")
29
+ tokenizer = BertTokenizer.from_pretrained("mwz/UrduParaphraseBERT")
30
+
31
+ def paraphrase_urdu_sentence(sentence):
32
+ input_ids = tokenizer.encode(sentence, padding="longest", truncation=True, max_length=512, return_tensors="pt")
33
+ generated_ids = model.generate(input_ids=input_ids, max_length=128, num_beams=4, no_repeat_ngram_size=2)
34
+
35
+ paraphrase = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
36
+ return paraphrase
37
+ sentence = "ایک مثالی روشنی کا مشہور نقطہ آبادی چھوٹی چھوٹی سڑکوں میں اپنے آپ کو خوشگوار کرسکتی ہے۔"
38
+ paraphrased_sentence = paraphrase_urdu_sentence(sentence)
39
+ print(paraphrased_sentence)
 
 
 
 
 
 
 
 
40
  ```