prithivida
commited on
Commit
·
e922b4d
1
Parent(s):
165d869
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Parrot
|
2 |
+
|
3 |
+
## 1. What is Parrot?
|
4 |
+
Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more than just a paraphrasing model.
|
5 |
+
|
6 |
+
## 2. Why Parrot?
|
7 |
+
**Huggingface** lists [12 paraphrase models,](https://huggingface.co/models?pipeline_tag=text2text-generation&search=paraphrase) **RapidAPI** lists 7 fremium and commercial paraphrasers like [QuillBot](https://rapidapi.com/search/paraphrase?section=apis&page=1), Rasa has discussed an experimental paraphraser for augmenting text data [here](https://forum.rasa.com/t/paraphrasing-for-nlu-data-augmentation-experimental/27744), Sentence-transfomers offers a [paraphrase mining utility](https://www.sbert.net/examples/applications/paraphrase-mining/README.html) and [NLPAug](https://github.com/makcedward/nlpaug) offers word level augmentation with a [PPDB](http://paraphrase.org/#/download) (a multi-million paraphrase database). While these attempts at paraphrasing are great, there are still some gaps and paraphrasing is NOT yet a mainstream option for text augmentation in building NLU models....Parrot is a humble attempt to fill some of these gaps.
|
8 |
+
|
9 |
+
**What is a good paraphrase?** Almost all conditioned text generation models are validated on 2 factors, (1) if the generated text conveys the same meaning as the original context (Adequacy) (2) if the text is fluent / grammatically correct english (Fluency). For instance Neural Machine Translation outputs are tested for Adequacy and Fluency. But [a good paraphrase](https://www.aclweb.org/anthology/D10-1090.pdf) should be adequate and fluent while being as different as possible on the surface lexical form. With respect to this definition, the **3 key metrics** that measures the quality of paraphrases are:
|
10 |
+
- **Adequacy** (Is the meaning preserved adequately?)
|
11 |
+
- **Fluency** (Is the paraphrase fluent English?)
|
12 |
+
- **Diversity (Lexical / Phrasal / Syntactical)** (How much has the paraphrase changed the original sentence?)
|
13 |
+
|
14 |
+
*Parrot offers knobs to control Adequacy, Fluency and Diversity as per your needs.*
|
15 |
+
|
16 |
+
**What makes a paraphraser a good augmentor?** For training a NLU model we just don't need a lot of utterances but utterances with intents and slots/entities annotated. Typical flow would be:
|
17 |
+
- Given an **input utterance + input annotations** a good augmentor spits out N **output paraphrases** while preserving the intent and slots.
|
18 |
+
- The output paraphrases are then converted into annotated data using the input annotations that we got in step 1.
|
19 |
+
- The annotated data created out of the output paraphrases then makes the training dataset for your NLU model.
|
20 |
+
|
21 |
+
But in general being a generative model paraphrasers doesn't guarantee to preserve the slots/entities. So the ability to generate high quality paraphrases in a constrained fashion without trading off the intents and slots for lexical dissimilarity makes a paraphraser a good augmentor. *More on this in section 3 below*
|
22 |
+
|
23 |
+
### Installation
|
24 |
+
```python
|
25 |
+
pip install parrot
|
26 |
+
```
|
27 |
+
|
28 |
+
### Quickstart
|
29 |
+
```python
|
30 |
+
|
31 |
+
import warnings
|
32 |
+
warnings.filterwarnings("ignore")
|
33 |
+
parrot = Parrot(model_tag="prithivida/parrot_paraphraser_on_T5", use_gpu=True)
|
34 |
+
phrases = ["Can you recommed some upscale restaurants in Rome?",
|
35 |
+
"What are the famous places we should not miss in Russia?"
|
36 |
+
]
|
37 |
+
|
38 |
+
for phrase in phrases:
|
39 |
+
print("-"*100)
|
40 |
+
print(phrase)
|
41 |
+
print("-"*100)
|
42 |
+
para_phrases = parrot.augment(input_phrase=phrase)
|
43 |
+
for para_phrase in para_phrases:
|
44 |
+
print(para_phrase)
|
45 |
+
```
|
46 |
+
|
47 |
+
<pre>
|
48 |
+
|
49 |
+
|
50 |
+
-----------------------------------------------------------------------------
|
51 |
+
Input_phrase: Can you recommed some upscale restaurants in Rome?
|
52 |
+
-----------------------------------------------------------------------------
|
53 |
+
"which upscale restaurants are recommended in rome?"
|
54 |
+
"which are the best restaurants in rome?"
|
55 |
+
"are there any upscale restaurants near rome?"
|
56 |
+
"can you recommend a good restaurant in rome?"
|
57 |
+
"can you recommend some of the best restaurants in rome?"
|
58 |
+
"can you recommend some best restaurants in rome?"
|
59 |
+
"can you recommend some upscale restaurants in rome?"
|
60 |
+
-----------------------------------------------------------------------------
|
61 |
+
Input_phrase: What are the famous places we should not miss in Russia
|
62 |
+
-----------------------------------------------------------------------------
|
63 |
+
"which are the must do places for tourists to visit in russia?"
|
64 |
+
"what are the best places to visit in russia?"
|
65 |
+
"what are some of the most visited sights in russia?"
|
66 |
+
"what are some of the most beautiful places in russia that tourists should not miss?"
|
67 |
+
"which are some of the most beautiful places to visit in russia?"
|
68 |
+
"what are some of the most important places to visit in russia?"
|
69 |
+
"what are some of the most famous places of russia?"
|
70 |
+
"what are some places we should not miss in russia?"
|
71 |
+
|
72 |
+
</pre>
|
73 |
+
|
74 |
+
### Knobs
|
75 |
+
|
76 |
+
```python
|
77 |
+
|
78 |
+
para_phrases = parrot.augment(input_phrase=phrase,
|
79 |
+
diversity_ranker="levenshtein",
|
80 |
+
do_diverse=False,
|
81 |
+
max_return_phrases = 10,
|
82 |
+
max_length=32,
|
83 |
+
adequacy_threshold = 0.99,
|
84 |
+
fluency_threshold = 0.90)
|
85 |
+
|
86 |
+
```
|
87 |
+
|
88 |
+
|
89 |
+
## 3. Scope
|
90 |
+
|
91 |
+
In the space of conversational engines, knowledge bots are to which **we ask questions** like *"when was the Berlin wall teared down?"*, transactional bots are to which **we give commands** like *"Turn on the music please"* and voice assistants are the ones which can do both answer questions and action our commands. Parrot mainly foucses on augmenting texts typed-into or spoken-to conversational interfaces for building robust NLU models. (*So usually people neither type out or yell out long paragraphs to conversational interfaces. Hence the pre-trained model is trained on text samples of maximum length of 64.*)
|
92 |
+
|
93 |
+
*While Parrot predominantly aims to be a text augmentor for building good NLU models, it can also be used as a pure-play paraphraser.*
|
94 |
+
|
95 |
+
|