Spaces:

hnliu
/

GPTagger

Sleeping

App Files Files Community

Haonan Liu commited on Aug 11, 2023

Commit

248ece2

1 Parent(s): 2e7eced

update app and add doc

Browse files

Files changed (2) hide show

README.md +78 -1
app.py +24 -12

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: GPTagger
-emoji: 📉
 colorFrom: red
 colorTo: pink
 sdk: gradio
@@ -11,3 +11,80 @@ license: gpl-3.0
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: GPTagger
+emoji: 🏷️
 colorFrom: red
 colorTo: pink
 sdk: gradio
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# [GPTagger](https://github.com/hnliu-git/GPTagger) :label:
+GPT Tagger is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT. However, using GPT as a text tagger is not a trivial task. GPT has the tendency to generate non-existing, fabricated, or processed text. To mitigate this issue, GPT Tagger provides a reliable method to ensure that the generated tags are derived from the input text while allowing GPT to process the extracted tags to some extent.
+Below is an example of how GPT may respond wrong.
+```md
+Text: "I earn $1000 this week!"
+Prompt: "Extract how much he/she earns"
+# Non-existent text
+GPT: "one thousand dollar"
+# Make-up text
+GPT: "$999999"
+# Processed text
+GPT: "$1,000"
+```
+## Introduction
+These incorrect responses highlight the importance of using a reliable tag extraction tool like GPT Tagger. To do that, GPT Tagger follows a set of main steps:
+1. 🕵️‍♀️ Extraction: GPT Tagger sniffs out all possible tags by following your instructions to GPT.
+2. 🔍 Indexing: It spots the exact locations of these tags within the text.
+3. ✅ Validator: GPT Tagger's trusty validator steps in to validate if the extracted tags pass the rule-based and ML-based checks.
+Check the example above how we extract ingredients from a yummy recipe text. 😋
+## Features ✨
+### Scale up GPT annotators and use switch between GPT3.5 and GPT4 easily
+- Want to have a higher precision? try using GPT-4!
+- Want to have a higher recall? Scale up the number of GPT annotators!
+### Instead of making a perfect prompt, use validator to shave off bad extractions
+- Simple validator: Length, Regex...
+- ML validator: GPT validator (Consider it like a chain of GPTs!)
+## How to Use 🚀
+### Setup
+```shell
+make install
+export OPENAI_API_KEY=<your-key>
+```
+### Pre-defined NER pipeline
+The easiest way to dive into the GPT Tagger is through the Gradio web demo! Fire it up with a single command:
+```shell
+poetry run python GPTagger/app.py
+```
+If you prefer having the power of GPT Tagger at your fingertips in Python, check out this snippet:
+```python
+from pathlib import Path
+from GPTagger import *
+cfg = NerConfig(
+    tag_name='date',
+    tag_regex=r"\d",
+    tag_max_len=128,
+)
+prompt = PromptTemplate.from_template(Path('<path-to-prompt>').read_text())
+pipeline = NerPipeline.from_config(cfg)
+doc = Path('<path-to-doc>').read_text()
+tags = pipeline(doc, prompt)
+```
+### Build Custom Pipelines 🎉
+We believe that the possibilities of using GPT as a text tagger are endless! We invite you to contribute your own custom pipelines. Together, we'll unlock the true potential of GPT Tagger and make text tagging an better experience.
+Leave a star if you find GPTagger is useful for your product or company! 🌟

app.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import gradio as gr
 from GPTagger import *
@@ -14,20 +15,20 @@ TEXT:
 def ner(
     model: str,
-    nr_call: int,
     tag_name: str,
     tag_max_len: int,
     text: str,
     prompt: str,
 ):
-    cfg = NerConfig(
         tag_name=tag_name,
         model=model,
-        nr_calls=nr_call,
-        tag_max_len=tag_max_len,
     )
-    ner_pipeline = NerPipeline.from_config(cfg)
     template = PromptTemplate.from_template(prompt)
     extractions = ner_pipeline(text, template, "")
@@ -42,26 +43,36 @@ def ner(
     return {"text": text, "entities": output}
-with gr.Blocks(theme=gr.themes.Default(text_size=gr.themes.sizes.text_lg)) as demo:
     with gr.Row():
-        tag_name = gr.Textbox(label="tag name")
         tag_max_len = gr.Slider(
-            minimum=10, maximum=1000, step=10, label="max length of the tag"
         )
     with gr.Row():
         model = gr.Dropdown(
             ["gpt-3.5-turbo-0613", "gpt-4-0613"],
-            label="model_name",
             value="gpt-3.5-turbo-0613",
         )
         nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
     with gr.Row():
         prompt = gr.TextArea(
             placeholder="Enter your prompt here...",
-            label="prompt",
             value=default_prompt,
         )
-        text = gr.TextArea(placeholder="Enter your text here...", label="text")
     btn = gr.Button("Submit")
     output = gr.HighlightedText()
     btn.click(
@@ -73,6 +84,7 @@ with gr.Blocks(theme=gr.themes.Default(text_size=gr.themes.sizes.text_lg)) as de
             tag_max_len,
             text,
             prompt,
         ],
         outputs=output,
     )

+import os
 import gradio as gr
 from GPTagger import *
 def ner(
     model: str,
+    nr_calls: int,
     tag_name: str,
     tag_max_len: int,
     text: str,
     prompt: str,
+    key: str,
 ):
+    os.environ['OPENAI_API_KEY'] = key
+    ner_pipeline = NerPipeline(
         tag_name=tag_name,
+        nr_calls=nr_calls,
         model=model,
+        tag_max_len=tag_max_len
     )
     template = PromptTemplate.from_template(prompt)
     extractions = ner_pipeline(text, template, "")
     return {"text": text, "entities": output}
+with gr.Blocks() as demo:
+    gr.Markdown(
+        """
+        # GPTagger 🏷️
+        [GPTagger](https://github.com/hnliu-git/GPTagger) is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT.
+        Simply specify the tag you want to extract from the text using prompt, you will get them highlighted in the output.
+        """
+    )
+    with gr.Row():
+        key = gr.Textbox(label='OpenAI API Key:')
     with gr.Row():
+        tag_name = gr.Textbox(label="Tag Name:", placeholder='Enter the tag you want to extract')
         tag_max_len = gr.Slider(
+            minimum=10, maximum=1000, step=10, label="Max length of a tag", value=50
         )
     with gr.Row():
         model = gr.Dropdown(
             ["gpt-3.5-turbo-0613", "gpt-4-0613"],
+            label="Model Name:",
             value="gpt-3.5-turbo-0613",
         )
         nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
     with gr.Row():
         prompt = gr.TextArea(
             placeholder="Enter your prompt here...",
+            label="Prompt: (Please include the default prompt at the end)",
             value=default_prompt,
         )
+        text = gr.TextArea(placeholder="Enter your text here...", label="Text")
     btn = gr.Button("Submit")
     output = gr.HighlightedText()
     btn.click(
             tag_max_len,
             text,
             prompt,
+            key
         ],
         outputs=output,
     )