jeremy-costello
commited on
Commit
•
f8f67af
1
Parent(s):
70cf23f
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,17 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
4-bit quantization of the vicuna-13b-v1.1 model.
|
5 |
+
|
6 |
+
The delta was added to the original LLaMa weights using FastChat.
|
7 |
+
Quantization and inference with GPTQ-For-LLaMa (commit 58c8ab4).
|
8 |
+
|
9 |
+
Quantization args: $MODEL_DIRECTORY, c4, wbits 4, true-sequential, act-order, groupsize 128.
|
10 |
+
Inference args: $MODEL_DIRECTORY, wbits 4, groupsize 128, load $CHECKPOINT_FILE, device=0 (if using GPU)
|
11 |
+
You may have to change min_length and max_length for better inference outputs.
|
12 |
+
|
13 |
+
The separator has been changed to \</s\>. Simple prompt is "Human: $REQUEST\</s\>Assistant:".
|
14 |
+
|
15 |
+
Delta: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1
|
16 |
+
FastChat: https://github.com/lm-sys/FastChat
|
17 |
+
GTPQ-for-LLaMa: https://github.com/qwopqwop200/GPTQ-for-LLaMa
|