Haonan Liu
commited on
Commit
Β·
248ece2
1
Parent(s):
2e7eced
update app and add doc
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
title: GPTagger
|
3 |
-
emoji:
|
4 |
colorFrom: red
|
5 |
colorTo: pink
|
6 |
sdk: gradio
|
@@ -11,3 +11,80 @@ license: gpl-3.0
|
|
11 |
---
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: GPTagger
|
3 |
+
emoji: π·οΈ
|
4 |
colorFrom: red
|
5 |
colorTo: pink
|
6 |
sdk: gradio
|
|
|
11 |
---
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
14 |
+
|
15 |
+
# [GPTagger](https://github.com/hnliu-git/GPTagger) :label:
|
16 |
+
|
17 |
+
GPT Tagger is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT. However, using GPT as a text tagger is not a trivial task. GPT has the tendency to generate non-existing, fabricated, or processed text. To mitigate this issue, GPT Tagger provides a reliable method to ensure that the generated tags are derived from the input text while allowing GPT to process the extracted tags to some extent.
|
18 |
+
|
19 |
+
Below is an example of how GPT may respond wrong.
|
20 |
+
|
21 |
+
```md
|
22 |
+
Text: "I earn $1000 this week!"
|
23 |
+
Prompt: "Extract how much he/she earns"
|
24 |
+
|
25 |
+
# Non-existent text
|
26 |
+
GPT: "one thousand dollar"
|
27 |
+
# Make-up text
|
28 |
+
GPT: "$999999"
|
29 |
+
# Processed text
|
30 |
+
GPT: "$1,000"
|
31 |
+
```
|
32 |
+
|
33 |
+
## Introduction
|
34 |
+
|
35 |
+
These incorrect responses highlight the importance of using a reliable tag extraction tool like GPT Tagger. To do that, GPT Tagger follows a set of main steps:
|
36 |
+
1. π΅οΈββοΈ Extraction: GPT Tagger sniffs out all possible tags by following your instructions to GPT.
|
37 |
+
2. π Indexing: It spots the exact locations of these tags within the text.
|
38 |
+
3. β
Validator: GPT Tagger's trusty validator steps in to validate if the extracted tags pass the rule-based and ML-based checks.
|
39 |
+
|
40 |
+
Check the example above how we extract ingredients from a yummy recipe text. π
|
41 |
+
|
42 |
+
## Features β¨
|
43 |
+
|
44 |
+
### Scale up GPT annotators and use switch between GPT3.5 and GPT4 easily
|
45 |
+
- Want to have a higher precision? try using GPT-4!
|
46 |
+
- Want to have a higher recall? Scale up the number of GPT annotators!
|
47 |
+
|
48 |
+
### Instead of making a perfect prompt, use validator to shave off bad extractions
|
49 |
+
- Simple validator: Length, Regex...
|
50 |
+
- ML validator: GPT validator (Consider it like a chain of GPTs!)
|
51 |
+
|
52 |
+
## How to Use π
|
53 |
+
|
54 |
+
### Setup
|
55 |
+
|
56 |
+
```shell
|
57 |
+
make install
|
58 |
+
export OPENAI_API_KEY=<your-key>
|
59 |
+
```
|
60 |
+
|
61 |
+
### Pre-defined NER pipeline
|
62 |
+
|
63 |
+
The easiest way to dive into the GPT Tagger is through the Gradio web demo! Fire it up with a single command:
|
64 |
+
```shell
|
65 |
+
poetry run python GPTagger/app.py
|
66 |
+
```
|
67 |
+
|
68 |
+
If you prefer having the power of GPT Tagger at your fingertips in Python, check out this snippet:
|
69 |
+
|
70 |
+
```python
|
71 |
+
from pathlib import Path
|
72 |
+
from GPTagger import *
|
73 |
+
|
74 |
+
cfg = NerConfig(
|
75 |
+
tag_name='date',
|
76 |
+
tag_regex=r"\d",
|
77 |
+
tag_max_len=128,
|
78 |
+
)
|
79 |
+
prompt = PromptTemplate.from_template(Path('<path-to-prompt>').read_text())
|
80 |
+
pipeline = NerPipeline.from_config(cfg)
|
81 |
+
|
82 |
+
doc = Path('<path-to-doc>').read_text()
|
83 |
+
tags = pipeline(doc, prompt)
|
84 |
+
```
|
85 |
+
|
86 |
+
### Build Custom Pipelines π
|
87 |
+
|
88 |
+
We believe that the possibilities of using GPT as a text tagger are endless! We invite you to contribute your own custom pipelines. Together, we'll unlock the true potential of GPT Tagger and make text tagging an better experience.
|
89 |
+
|
90 |
+
Leave a star if you find GPTagger is useful for your product or company! π
|
app.py
CHANGED
@@ -1,3 +1,4 @@
|
|
|
|
1 |
import gradio as gr
|
2 |
|
3 |
from GPTagger import *
|
@@ -14,20 +15,20 @@ TEXT:
|
|
14 |
|
15 |
def ner(
|
16 |
model: str,
|
17 |
-
|
18 |
tag_name: str,
|
19 |
tag_max_len: int,
|
20 |
text: str,
|
21 |
prompt: str,
|
|
|
22 |
):
|
23 |
-
|
|
|
24 |
tag_name=tag_name,
|
|
|
25 |
model=model,
|
26 |
-
|
27 |
-
tag_max_len=tag_max_len,
|
28 |
)
|
29 |
-
|
30 |
-
ner_pipeline = NerPipeline.from_config(cfg)
|
31 |
template = PromptTemplate.from_template(prompt)
|
32 |
|
33 |
extractions = ner_pipeline(text, template, "")
|
@@ -42,26 +43,36 @@ def ner(
|
|
42 |
return {"text": text, "entities": output}
|
43 |
|
44 |
|
45 |
-
with gr.Blocks(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
with gr.Row():
|
47 |
-
tag_name = gr.Textbox(label="tag
|
48 |
tag_max_len = gr.Slider(
|
49 |
-
minimum=10, maximum=1000, step=10, label="
|
50 |
)
|
51 |
with gr.Row():
|
52 |
model = gr.Dropdown(
|
53 |
["gpt-3.5-turbo-0613", "gpt-4-0613"],
|
54 |
-
label="
|
55 |
value="gpt-3.5-turbo-0613",
|
56 |
)
|
57 |
nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
|
58 |
with gr.Row():
|
59 |
prompt = gr.TextArea(
|
60 |
placeholder="Enter your prompt here...",
|
61 |
-
label="prompt",
|
62 |
value=default_prompt,
|
63 |
)
|
64 |
-
text = gr.TextArea(placeholder="Enter your text here...", label="
|
65 |
btn = gr.Button("Submit")
|
66 |
output = gr.HighlightedText()
|
67 |
btn.click(
|
@@ -73,6 +84,7 @@ with gr.Blocks(theme=gr.themes.Default(text_size=gr.themes.sizes.text_lg)) as de
|
|
73 |
tag_max_len,
|
74 |
text,
|
75 |
prompt,
|
|
|
76 |
],
|
77 |
outputs=output,
|
78 |
)
|
|
|
1 |
+
import os
|
2 |
import gradio as gr
|
3 |
|
4 |
from GPTagger import *
|
|
|
15 |
|
16 |
def ner(
|
17 |
model: str,
|
18 |
+
nr_calls: int,
|
19 |
tag_name: str,
|
20 |
tag_max_len: int,
|
21 |
text: str,
|
22 |
prompt: str,
|
23 |
+
key: str,
|
24 |
):
|
25 |
+
os.environ['OPENAI_API_KEY'] = key
|
26 |
+
ner_pipeline = NerPipeline(
|
27 |
tag_name=tag_name,
|
28 |
+
nr_calls=nr_calls,
|
29 |
model=model,
|
30 |
+
tag_max_len=tag_max_len
|
|
|
31 |
)
|
|
|
|
|
32 |
template = PromptTemplate.from_template(prompt)
|
33 |
|
34 |
extractions = ner_pipeline(text, template, "")
|
|
|
43 |
return {"text": text, "entities": output}
|
44 |
|
45 |
|
46 |
+
with gr.Blocks() as demo:
|
47 |
+
gr.Markdown(
|
48 |
+
"""
|
49 |
+
# GPTagger π·οΈ
|
50 |
+
|
51 |
+
[GPTagger](https://github.com/hnliu-git/GPTagger) is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT.
|
52 |
+
Simply specify the tag you want to extract from the text using prompt, you will get them highlighted in the output.
|
53 |
+
"""
|
54 |
+
)
|
55 |
+
with gr.Row():
|
56 |
+
key = gr.Textbox(label='OpenAI API Key:')
|
57 |
with gr.Row():
|
58 |
+
tag_name = gr.Textbox(label="Tag Name:", placeholder='Enter the tag you want to extract')
|
59 |
tag_max_len = gr.Slider(
|
60 |
+
minimum=10, maximum=1000, step=10, label="Max length of a tag", value=50
|
61 |
)
|
62 |
with gr.Row():
|
63 |
model = gr.Dropdown(
|
64 |
["gpt-3.5-turbo-0613", "gpt-4-0613"],
|
65 |
+
label="Model Name:",
|
66 |
value="gpt-3.5-turbo-0613",
|
67 |
)
|
68 |
nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
|
69 |
with gr.Row():
|
70 |
prompt = gr.TextArea(
|
71 |
placeholder="Enter your prompt here...",
|
72 |
+
label="Prompt: (Please include the default prompt at the end)",
|
73 |
value=default_prompt,
|
74 |
)
|
75 |
+
text = gr.TextArea(placeholder="Enter your text here...", label="Text")
|
76 |
btn = gr.Button("Submit")
|
77 |
output = gr.HighlightedText()
|
78 |
btn.click(
|
|
|
84 |
tag_max_len,
|
85 |
text,
|
86 |
prompt,
|
87 |
+
key
|
88 |
],
|
89 |
outputs=output,
|
90 |
)
|