Update README.md
Browse files
README.md
CHANGED
@@ -9,14 +9,15 @@ original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/we
|
|
9 |
|
10 |
This model is A quantized(miniaturized) version of the original model(21.42GB).
|
11 |
|
12 |
-
There are currently two well-known quantization
|
13 |
-
(1)GPTQ
|
14 |
The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
|
15 |
At least one GPU is currently required due to a limitation of the Accelerate library.
|
16 |
So this model cannot be run with the huggingface space free version.
|
17 |
You need autoGPTQ library to use this model.
|
18 |
|
19 |
-
(2)
|
|
|
20 |
You can use gguf model with llama.cpp at cpu only machine.
|
21 |
But maybe gguf model little bit slower then GPTQ especialy long text.
|
22 |
|
|
|
9 |
|
10 |
This model is A quantized(miniaturized) version of the original model(21.42GB).
|
11 |
|
12 |
+
There are currently two well-known quantization version of original model.
|
13 |
+
(1)GPTQ version(This model. 6.3 GB)
|
14 |
The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
|
15 |
At least one GPU is currently required due to a limitation of the Accelerate library.
|
16 |
So this model cannot be run with the huggingface space free version.
|
17 |
You need autoGPTQ library to use this model.
|
18 |
|
19 |
+
(2)llama.cpp version(gguf)([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB)
|
20 |
+
created by mmnga.
|
21 |
You can use gguf model with llama.cpp at cpu only machine.
|
22 |
But maybe gguf model little bit slower then GPTQ especialy long text.
|
23 |
|