Haonan Liu commited on
Commit
248ece2
Β·
1 Parent(s): 2e7eced

update app and add doc

Browse files
Files changed (2) hide show
  1. README.md +78 -1
  2. app.py +24 -12
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: GPTagger
3
- emoji: πŸ“‰
4
  colorFrom: red
5
  colorTo: pink
6
  sdk: gradio
@@ -11,3 +11,80 @@ license: gpl-3.0
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: GPTagger
3
+ emoji: 🏷️
4
  colorFrom: red
5
  colorTo: pink
6
  sdk: gradio
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ # [GPTagger](https://github.com/hnliu-git/GPTagger) :label:
16
+
17
+ GPT Tagger is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT. However, using GPT as a text tagger is not a trivial task. GPT has the tendency to generate non-existing, fabricated, or processed text. To mitigate this issue, GPT Tagger provides a reliable method to ensure that the generated tags are derived from the input text while allowing GPT to process the extracted tags to some extent.
18
+
19
+ Below is an example of how GPT may respond wrong.
20
+
21
+ ```md
22
+ Text: "I earn $1000 this week!"
23
+ Prompt: "Extract how much he/she earns"
24
+
25
+ # Non-existent text
26
+ GPT: "one thousand dollar"
27
+ # Make-up text
28
+ GPT: "$999999"
29
+ # Processed text
30
+ GPT: "$1,000"
31
+ ```
32
+
33
+ ## Introduction
34
+
35
+ These incorrect responses highlight the importance of using a reliable tag extraction tool like GPT Tagger. To do that, GPT Tagger follows a set of main steps:
36
+ 1. πŸ•΅οΈβ€β™€οΈ Extraction: GPT Tagger sniffs out all possible tags by following your instructions to GPT.
37
+ 2. πŸ” Indexing: It spots the exact locations of these tags within the text.
38
+ 3. βœ… Validator: GPT Tagger's trusty validator steps in to validate if the extracted tags pass the rule-based and ML-based checks.
39
+
40
+ Check the example above how we extract ingredients from a yummy recipe text. πŸ˜‹
41
+
42
+ ## Features ✨
43
+
44
+ ### Scale up GPT annotators and use switch between GPT3.5 and GPT4 easily
45
+ - Want to have a higher precision? try using GPT-4!
46
+ - Want to have a higher recall? Scale up the number of GPT annotators!
47
+
48
+ ### Instead of making a perfect prompt, use validator to shave off bad extractions
49
+ - Simple validator: Length, Regex...
50
+ - ML validator: GPT validator (Consider it like a chain of GPTs!)
51
+
52
+ ## How to Use πŸš€
53
+
54
+ ### Setup
55
+
56
+ ```shell
57
+ make install
58
+ export OPENAI_API_KEY=<your-key>
59
+ ```
60
+
61
+ ### Pre-defined NER pipeline
62
+
63
+ The easiest way to dive into the GPT Tagger is through the Gradio web demo! Fire it up with a single command:
64
+ ```shell
65
+ poetry run python GPTagger/app.py
66
+ ```
67
+
68
+ If you prefer having the power of GPT Tagger at your fingertips in Python, check out this snippet:
69
+
70
+ ```python
71
+ from pathlib import Path
72
+ from GPTagger import *
73
+
74
+ cfg = NerConfig(
75
+ tag_name='date',
76
+ tag_regex=r"\d",
77
+ tag_max_len=128,
78
+ )
79
+ prompt = PromptTemplate.from_template(Path('<path-to-prompt>').read_text())
80
+ pipeline = NerPipeline.from_config(cfg)
81
+
82
+ doc = Path('<path-to-doc>').read_text()
83
+ tags = pipeline(doc, prompt)
84
+ ```
85
+
86
+ ### Build Custom Pipelines πŸŽ‰
87
+
88
+ We believe that the possibilities of using GPT as a text tagger are endless! We invite you to contribute your own custom pipelines. Together, we'll unlock the true potential of GPT Tagger and make text tagging an better experience.
89
+
90
+ Leave a star if you find GPTagger is useful for your product or company! 🌟
app.py CHANGED
@@ -1,3 +1,4 @@
 
1
  import gradio as gr
2
 
3
  from GPTagger import *
@@ -14,20 +15,20 @@ TEXT:
14
 
15
  def ner(
16
  model: str,
17
- nr_call: int,
18
  tag_name: str,
19
  tag_max_len: int,
20
  text: str,
21
  prompt: str,
 
22
  ):
23
- cfg = NerConfig(
 
24
  tag_name=tag_name,
 
25
  model=model,
26
- nr_calls=nr_call,
27
- tag_max_len=tag_max_len,
28
  )
29
-
30
- ner_pipeline = NerPipeline.from_config(cfg)
31
  template = PromptTemplate.from_template(prompt)
32
 
33
  extractions = ner_pipeline(text, template, "")
@@ -42,26 +43,36 @@ def ner(
42
  return {"text": text, "entities": output}
43
 
44
 
45
- with gr.Blocks(theme=gr.themes.Default(text_size=gr.themes.sizes.text_lg)) as demo:
 
 
 
 
 
 
 
 
 
 
46
  with gr.Row():
47
- tag_name = gr.Textbox(label="tag name")
48
  tag_max_len = gr.Slider(
49
- minimum=10, maximum=1000, step=10, label="max length of the tag"
50
  )
51
  with gr.Row():
52
  model = gr.Dropdown(
53
  ["gpt-3.5-turbo-0613", "gpt-4-0613"],
54
- label="model_name",
55
  value="gpt-3.5-turbo-0613",
56
  )
57
  nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
58
  with gr.Row():
59
  prompt = gr.TextArea(
60
  placeholder="Enter your prompt here...",
61
- label="prompt",
62
  value=default_prompt,
63
  )
64
- text = gr.TextArea(placeholder="Enter your text here...", label="text")
65
  btn = gr.Button("Submit")
66
  output = gr.HighlightedText()
67
  btn.click(
@@ -73,6 +84,7 @@ with gr.Blocks(theme=gr.themes.Default(text_size=gr.themes.sizes.text_lg)) as de
73
  tag_max_len,
74
  text,
75
  prompt,
 
76
  ],
77
  outputs=output,
78
  )
 
1
+ import os
2
  import gradio as gr
3
 
4
  from GPTagger import *
 
15
 
16
  def ner(
17
  model: str,
18
+ nr_calls: int,
19
  tag_name: str,
20
  tag_max_len: int,
21
  text: str,
22
  prompt: str,
23
+ key: str,
24
  ):
25
+ os.environ['OPENAI_API_KEY'] = key
26
+ ner_pipeline = NerPipeline(
27
  tag_name=tag_name,
28
+ nr_calls=nr_calls,
29
  model=model,
30
+ tag_max_len=tag_max_len
 
31
  )
 
 
32
  template = PromptTemplate.from_template(prompt)
33
 
34
  extractions = ner_pipeline(text, template, "")
 
43
  return {"text": text, "entities": output}
44
 
45
 
46
+ with gr.Blocks() as demo:
47
+ gr.Markdown(
48
+ """
49
+ # GPTagger 🏷️
50
+
51
+ [GPTagger](https://github.com/hnliu-git/GPTagger) is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT.
52
+ Simply specify the tag you want to extract from the text using prompt, you will get them highlighted in the output.
53
+ """
54
+ )
55
+ with gr.Row():
56
+ key = gr.Textbox(label='OpenAI API Key:')
57
  with gr.Row():
58
+ tag_name = gr.Textbox(label="Tag Name:", placeholder='Enter the tag you want to extract')
59
  tag_max_len = gr.Slider(
60
+ minimum=10, maximum=1000, step=10, label="Max length of a tag", value=50
61
  )
62
  with gr.Row():
63
  model = gr.Dropdown(
64
  ["gpt-3.5-turbo-0613", "gpt-4-0613"],
65
+ label="Model Name:",
66
  value="gpt-3.5-turbo-0613",
67
  )
68
  nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
69
  with gr.Row():
70
  prompt = gr.TextArea(
71
  placeholder="Enter your prompt here...",
72
+ label="Prompt: (Please include the default prompt at the end)",
73
  value=default_prompt,
74
  )
75
+ text = gr.TextArea(placeholder="Enter your text here...", label="Text")
76
  btn = gr.Button("Submit")
77
  output = gr.HighlightedText()
78
  btn.click(
 
84
  tag_max_len,
85
  text,
86
  prompt,
87
+ key
88
  ],
89
  outputs=output,
90
  )