Spaces:

microsoft
/

llmlingua-2

Running

App Files Files Community

qianhuiwu commited on Mar 25

Commit

b8f3522

•

1 Parent(s): ab61a2b

Remove env setup. Update Readme.

Browse files

Files changed (10) hide show

Makefile +0 -16
README.md +38 -1
app.py +8 -17
images/LLMLingua_logo.png +0 -0
llmlingua/__init__.py +0 -4
llmlingua/prompt_compressor.py +0 -0
llmlingua/utils.py +0 -98
llmlingua/version.py +0 -14
setup.cfg +0 -28
setup.py +0 -70

Makefile DELETED Viewed

@@ -1,16 +0,0 @@
-.PHONY: install style test
-PYTHON := python
-CHECK_DIRS := llmlingua tests
-install:
-	@${PYTHON} setup.py bdist_wheel
-	@${PYTHON} -m pip install dist/sdtools*
-style:
-	black $(CHECK_DIRS)
-	isort -rc $(CHECK_DIRS)
-	flake8 $(CHECK_DIRS)
-test:
-	@${PYTHON} -m pytest -n auto --dist=loadfile -s -v ./tests/

README.md CHANGED Viewed

@@ -9,5 +9,42 @@ app_file: app.py
 pinned: false
 license: cc-by-nc-sa-4.0
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 pinned: false
 license: cc-by-nc-sa-4.0
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+LLMLingua-2 is one of the branch from [LLMLingua Series](https://llmlingua.com/). Please check the links below for more information.
+<div style="display: flex; align-items: center;">
+    <div style="width: 100px; margin-right: 10px; height:auto;" align="left">
+        <img src="images/LLMLingua_logo.png" alt="LLMLingua" width="100" align="left">
+    </div>
+    <div style="flex-grow: 1;" align="center">
+        <h2 align="center">LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression</h2>
+    </div>
+</div>
+<p align="center">
+    | <a href="https://llmlingua.com/"><b>Project Page</b></a> |
+    <a href="https://aclanthology.org/2023.emnlp-main.825/"><b>LLMLingua</b></a> |
+    <a href="https://arxiv.org/abs/2310.06839"><b>LongLLMLingua</b></a> |
+    <a href="https://arxiv.org/abs/2403."><b>LLMLingua-2</b></a> |
+    <a href="https://huggingface.co/spaces/microsoft/LLMLingua"><b>LLMLingua Demo</b></a> |
+    <a href="https://huggingface.co/spaces/microsoft/LLMLingua-2"><b>LLMLingua-2 Demo</b></a> |
+</p>
+## Brief Introduction
+**LLMLingua** utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.
+- [LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://aclanthology.org/2023.emnlp-main.825/) (EMNLP 2023)<br>
+  _Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
+**LongLLMLingua** mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.
+- [LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (ICLR ME-FoMo 2024)<br>
+  _Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
+**LLMLingua-2**, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.
+- [LLMLingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://arxiv.org/abs/2403.) (Under Review)<br>
+  _Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_

app.py CHANGED Viewed

@@ -1,9 +1,3 @@
-# build the environment
-import sys
-import subprocess
-subprocess.run([sys.executable, "-m", "pip", "install", "-e", "."])
 # import the required libraries
 import gradio as gr
 import json
@@ -60,15 +54,12 @@ def compress(original_prompt, compression_rate, base_model="xlm-roberta-large",
 title = "LLMLingua-2"
-header = ("""
-    <div align='center'>
-    <h1></h1>
-    <h1>LLMLingua-2: Efficient and Faithful Task-Agnostic Prompt Compression via Data Distillation</h1>
-    <h3>Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruehle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Dongmei Zhang</h3>
-    <h3><a href='https://llmlingua.com/llmlingua2.html' target='_blank' rel='noopener'>[Project]</a><a href='https://arxiv.org/pdf/2403.12968.pdf' target='_blank' rel='noopener'>[Paper]</a><a href='https://github.com/microsoft/LLMLingua' target='_blank' rel='noopener'>[Code]</a>
-    </div>
-    """
-)
 theme = "soft"
 css = """#anno-img .mask {opacity: 0.5; transition: all 0.2s ease-in-out;}
             #anno-img .mask.active {opacity: 0.7}"""
@@ -76,8 +67,8 @@ css = """#anno-img .mask {opacity: 0.5; transition: all 0.2s ease-in-out;}
 original_prompt_text = """John: So, um, I've been thinking about the project, you know, and I believe we need to, uh, make some changes. I mean, we want the project to succeed, right? So, like, I think we should consider maybe revising the timeline.
 Sarah: I totally agree, John. I mean, we have to be realistic, you know. The timeline is, like, too tight. You know what I mean? We should definitely extend it.
 """
-# with gr.Blocks(title=title, theme=gr.themes.Soft(), css=css) as app:
-with gr.Blocks(title=title, css=css) as app: # 'YenLai/Superhuman' 'HaleyCH/HaleyCH_Theme' 'gradio/monochrome' ''zkunn/Alipay_Gradio_theme''
     gr.Markdown(header)
     with gr.Row():
         with gr.Column(scale=3):

 # import the required libraries
 import gradio as gr
 import json
 title = "LLMLingua-2"
+header = """# LLMLingua-2: Efficient and Faithful Task-Agnostic Prompt Compression via Data Distillation
+            _Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruehle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_<br/>
+            [[project page]](https://llmlingua.com/llmlingua2.html) [[paper]](https://arxiv.org/abs/2403.12968) [[code]](https://github.com/microsoft/LLMLingua)
+        """
 theme = "soft"
 css = """#anno-img .mask {opacity: 0.5; transition: all 0.2s ease-in-out;}
             #anno-img .mask.active {opacity: 0.7}"""
 original_prompt_text = """John: So, um, I've been thinking about the project, you know, and I believe we need to, uh, make some changes. I mean, we want the project to succeed, right? So, like, I think we should consider maybe revising the timeline.
 Sarah: I totally agree, John. I mean, we have to be realistic, you know. The timeline is, like, too tight. You know what I mean? We should definitely extend it.
 """
+with gr.Blocks(title=title, css=css) as app:
     gr.Markdown(header)
     with gr.Row():
         with gr.Column(scale=3):

images/LLMLingua_logo.png ADDED Viewed

llmlingua/__init__.py DELETED Viewed

@@ -1,4 +0,0 @@
-# Copyright (c) 2024 Microsoft
-# Licensed under The cc-by-nc-sa-4.0 License [see LICENSE for details]
-# flake8: noqa
-from .prompt_compressor import PromptCompressor

llmlingua/prompt_compressor.py DELETED Viewed

The diff for this file is too large to render. See raw diff

llmlingua/utils.py DELETED Viewed

@@ -1,98 +0,0 @@
-import torch
-from torch.utils.data import Dataset
-import random, os
-import numpy as np
-import torch
-import string
-class TokenClfDataset(Dataset):
-    def __init__(
-        self,
-        texts,
-        max_len=512,
-        tokenizer=None,
-        model_name="bert-base-multilingual-cased",
-    ):
-        self.len = len(texts)
-        self.texts = texts
-        self.tokenizer = tokenizer
-        self.max_len = max_len
-        self.model_name = model_name
-        if "bert-base-multilingual-cased" in model_name:
-            self.cls_token = "[CLS]"
-            self.sep_token = "[SEP]"
-            self.unk_token = "[UNK]"
-            self.pad_token = "[PAD]"
-            self.mask_token = "[MASK]"
-        elif "xlm-roberta-large" in model_name:
-            self.bos_token = "<s>"
-            self.eos_token = "</s>"
-            self.sep_token = "</s>"
-            self.cls_token = "<s>"
-            self.unk_token = "<unk>"
-            self.pad_token = "<pad>"
-            self.mask_token = "<mask>"
-        else:
-            raise NotImplementedError()
-    def __getitem__(self, index):
-        text = self.texts[index]
-        tokenized_text = self.tokenizer.tokenize(text)
-        tokenized_text = (
-            [self.cls_token] + tokenized_text + [self.sep_token]
-        )  # add special tokens
-        if len(tokenized_text) > self.max_len:
-            tokenized_text = tokenized_text[: self.max_len]
-        else:
-            tokenized_text = tokenized_text + [
-                self.pad_token for _ in range(self.max_len - len(tokenized_text))
-            ]
-        attn_mask = [1 if tok != self.pad_token else 0 for tok in tokenized_text]
-        ids = self.tokenizer.convert_tokens_to_ids(tokenized_text)
-        return {
-            "ids": torch.tensor(ids, dtype=torch.long),
-            "mask": torch.tensor(attn_mask, dtype=torch.long),
-        }
-    def __len__(self):
-        return self.len
-def seed_everything(seed: int):
-    random.seed(seed)
-    os.environ["PYTHONHASHSEED"] = str(seed)
-    np.random.seed(seed)
-    torch.manual_seed(seed)
-    torch.cuda.manual_seed(seed)
-    torch.backends.cudnn.deterministic = True
-    torch.backends.cudnn.benchmark = False
-def is_begin_of_new_word(token, model_name, force_tokens, token_map):
-    if "bert-base-multilingual-cased" in model_name:
-        if token.lstrip("##") in force_tokens or token.lstrip("##") in set(token_map.values()):
-            return True
-        return not token.startswith("##")
-    elif "xlm-roberta-large" in model_name:
-        if token in string.punctuation or token in force_tokens or token in set(token_map.values()):
-            return True
-        return token.startswith("▁")
-    else:
-        raise NotImplementedError()
-def replace_added_token(token, token_map):
-    for ori_token, new_token in token_map.items():
-        token = token.replace(new_token, ori_token)
-    return token
-def get_pure_token(token, model_name):
-    if "bert-base-multilingual-cased" in model_name:
-        return token.lstrip("##")
-    elif "xlm-roberta-large" in model_name:
-        return token.lstrip("▁")
-    else:
-        raise NotImplementedError()

llmlingua/version.py DELETED Viewed

@@ -1,14 +0,0 @@
-# Copyright (c) 2023 Microsoft
-# Licensed under The MIT License [see LICENSE for details]
-_MAJOR = "0"
-_MINOR = "1"
-# On master and in a nightly release the patch should be one ahead of the last
-# released build.
-_PATCH = "6"
-# This is mainly for nightly builds which have the suffix ".dev$DATE". See
-# https://semver.org/#is-v123-a-semantic-version for the semantics.
-_SUFFIX = ""
-VERSION_SHORT = "{0}.{1}".format(_MAJOR, _MINOR)
-VERSION = "{0}.{1}.{2}{3}".format(_MAJOR, _MINOR, _PATCH, _SUFFIX)

setup.cfg DELETED Viewed

@@ -1,28 +0,0 @@
-[isort]
-default_section = FIRSTPARTY
-ensure_newline_before_comments = True
-force_grid_wrap = 0
-include_trailing_comma = True
-known_first_party = sdtools
-known_third_party =
-    imblearn
-    numpy
-    pandas
-    pytorch-tabnet
-    scipy
-    sklearn
-    torch
-    torchaudio
-    torchvision
-    torch_xla
-    tqdm
-    xgboost
-line_length = 119
-lines_after_imports = 2
-multi_line_output = 3
-use_parentheses = True
-[flake8]
-ignore = E203, E501, E741, W503, W605
-max-line-length = 119

setup.py DELETED Viewed

@@ -1,70 +0,0 @@
-# Copyright (c) 2023 Microsoft
-# Licensed under The MIT License [see LICENSE for details]
-from setuptools import find_packages, setup
-# PEP0440 compatible formatted version, see:
-# https://www.python.org/dev/peps/pep-0440/
-#
-# release markers:
-#   X.Y
-#   X.Y.Z   # For bugfix releases
-#
-# pre-release markers:
-#   X.YaN   # Alpha release
-#   X.YbN   # Beta release
-#   X.YrcN  # Release Candidate
-#   X.Y     # Final release
-# version.py defines the VERSION and VERSION_SHORT variables.
-# We use exec here so we don't import allennlp whilst setting up.
-VERSION = {}  # type: ignore
-with open("llmlingua/version.py", "r") as version_file:
-    exec(version_file.read(), VERSION)
-INSTALL_REQUIRES = [
-    "transformers>=4.26.0",
-    "accelerate",
-    "torch",
-    "tiktoken",
-    "nltk",
-    "numpy",
-]
-QUANLITY_REQUIRES = [
-    "black==21.4b0",
-    "flake8>=3.8.3",
-    "isort>=5.5.4",
-    "pre-commit",
-    "pytest",
-    "pytest-xdist",
-]
-DEV_REQUIRES = INSTALL_REQUIRES + QUANLITY_REQUIRES
-setup(
-    name="llmlingua",
-    version=VERSION["VERSION"],
-    author="The LLMLingua team",
-    author_email="[email protected]",
-    description="To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.",
-    long_description=open("README.md", encoding="utf8").read(),
-    long_description_content_type="text/markdown",
-    keywords="Prompt Compression, LLMs, Inference Acceleration, Black-box LLMs, Efficient LLMs",
-    license="MIT License",
-    url="https://github.com/microsoft/LLMLingua",
-    classifiers=[
-        "Intended Audience :: Science/Research",
-        "Development Status :: 3 - Alpha",
-        "Programming Language :: Python :: 3",
-        "Topic :: Scientific/Engineering :: Artificial Intelligence",
-    ],
-    package_dir={"": "."},
-    packages=find_packages("."),
-    extras_require={
-        "dev": DEV_REQUIRES,
-        "quality": QUANLITY_REQUIRES,
-    },
-    install_requires=INSTALL_REQUIRES,
-    include_package_data=True,
-    python_requires=">=3.8.0",
-    zip_safe=False,
-)