jeremy-costello
commited on
Commit
•
6826888
1
Parent(s):
7728de5
Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,12 @@ inference: false
|
|
7 |
The delta was added to the original LLaMa weights using FastChat.
|
8 |
Quantization and inference with GPTQ-For-LLaMa (commit 58c8ab4).
|
9 |
|
10 |
-
Quantization args: $MODEL_DIRECTORY, c4, wbits 4, true-sequential, act-order, groupsize 128.
|
11 |
-
Inference args: $MODEL_DIRECTORY, wbits 4, groupsize 128, load $CHECKPOINT_FILE
|
12 |
-
You may have to change min_length and max_length for better inference outputs.
|
13 |
|
14 |
The separator has been changed to \</s\>. Simple prompt is "Human: $REQUEST\</s\>Assistant:".
|
15 |
|
16 |
-
Delta: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1
|
17 |
-
FastChat: https://github.com/lm-sys/FastChat
|
18 |
GTPQ-for-LLaMa: https://github.com/qwopqwop200/GPTQ-for-LLaMa
|
|
|
7 |
The delta was added to the original LLaMa weights using FastChat.
|
8 |
Quantization and inference with GPTQ-For-LLaMa (commit 58c8ab4).
|
9 |
|
10 |
+
Quantization args: $MODEL_DIRECTORY, c4, wbits 4, true-sequential, act-order, groupsize 128. \
|
11 |
+
Inference args: $MODEL_DIRECTORY, wbits 4, groupsize 128, load $CHECKPOINT_FILE \
|
12 |
+
Add arg device=0 if using GPU for inference. You may have to change min_length and max_length for better inference outputs.
|
13 |
|
14 |
The separator has been changed to \</s\>. Simple prompt is "Human: $REQUEST\</s\>Assistant:".
|
15 |
|
16 |
+
Delta: https://huggingface.co/lmsys/vicuna-13b-delta-v1.1 \
|
17 |
+
FastChat: https://github.com/lm-sys/FastChat \
|
18 |
GTPQ-for-LLaMa: https://github.com/qwopqwop200/GPTQ-for-LLaMa
|