PetraAI commited on
Commit
a04e486
·
1 Parent(s): 8eddc47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -32
README.md CHANGED
@@ -1,34 +1,31 @@
1
- <h1 align="center">AutoGPTQ</h1>
2
- <p align="center">An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.</p>
3
- <p align="center">
4
- <a href="https://github.com/PanQiWei/AutoGPTQ/releases">
5
- <img alt="GitHub release" src="https://img.shields.io/github/release/PanQiWei/AutoGPTQ.svg">
6
- </a>
7
- <a href="https://pypi.org/project/auto-gptq/">
8
- <img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dd/auto-gptq">
9
- </a>
10
- </p>
11
- <h4 align="center">
12
- <p>
13
- <b>English</b> |
14
- <a href="https://github.com/PanQiWei/AutoGPTQ/blob/main/README_zh.md">中文</a>
15
- </p>
16
- </h4>
17
-
18
- *<center>📣 Long time no see! 👋 Architecture upgrade, performance optimization and more new features will come in July and August, stay tune! 🥂</center>*
19
-
20
- ## News or Update
21
-
22
- - 2023-08-21 - (News) - Team of Qwen officially released 4bit quantized version of Qwen-7B based on `auto-gptq`, and provided [a detailed benchmark results](https://huggingface.co/Qwen/Qwen-7B-Chat-Int4#%E9%87%8F%E5%8C%96-quantization)
23
- - 2023-08-06 - (Update) - Support exllama's q4 CUDA kernel to have at least 1.3x speed up for int4 quantized models when doing inference.
24
- - 2023-08-04 - (Update) - Support RoCm so that AMD GPU users can use auto-gptq with CUDA extensions.
25
- - 2023-07-26 - (Update) - An elegant [PPL benchmark script](examples/benchmark/perplexity.py) to get results that can be fairly compared with other libraries such as `llama.cpp`.
26
- - 2023-06-05 - (Update) - Integrate with 🤗 peft to use gptq quantized model to train adapters, support LoRA, AdaLoRA, AdaptionPrompt, etc.
27
- - 2023-05-30 - (Update) - Support download/upload quantized model from/to 🤗 Hub.
28
-
29
- *For more histories please turn to [here](docs/NEWS_OR_UPDATE.md)*
30
-
31
- ## Performance Comparison
32
 
33
  ### Inference Speed
34
  > The result is generated using [this script](examples/benchmark/generation_speed.py), batch size of input is 1, decode strategy is beam search and enforce the model to generate 512 tokens, speed metric is tokens/s (the larger, the better).
@@ -336,4 +333,4 @@ pytest tests/ -s
336
  - Specially thanks **qwopqwop200**, for code in this project that relevant to quantization are mainly referenced from [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda).
337
 
338
 
339
- [![Star History Chart](https://api.star-history.com/svg?repos=PanQiwei/AutoGPTQ&type=Date)](https://star-history.com/#PanQiWei/AutoGPTQ&Date)
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - PetraAI/PetraAI
5
+ language:
6
+ - ar
7
+ - en
8
+ - ch
9
+ - zh
10
+ metrics:
11
+ - accuracy
12
+ - bertscore
13
+ - bleu
14
+ - chrf
15
+ - code_eval
16
+ - brier_score
17
+ tags:
18
+ - chemistry
19
+ - biology
20
+ - finance
21
+ - legal
22
+ - music
23
+ - code
24
+ - art
25
+ - climate
26
+ - medical
27
+ - text-generation-inference
28
+ ---
 
 
 
29
 
30
  ### Inference Speed
31
  > The result is generated using [this script](examples/benchmark/generation_speed.py), batch size of input is 1, decode strategy is beam search and enforce the model to generate 512 tokens, speed metric is tokens/s (the larger, the better).
 
333
  - Specially thanks **qwopqwop200**, for code in this project that relevant to quantization are mainly referenced from [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda).
334
 
335
 
336
+ [![Star History Chart](https://api.star-history.com/svg?repos=PanQiwei/AutoGPTQ&type=Date)](https://star-history.com/#PanQiWei/AutoGPTQ&Date)