Update README.md
Browse files
README.md
CHANGED
@@ -10,15 +10,15 @@ original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/we
|
|
10 |
This model is A quantized(miniaturized) version of the original model(21.42GB).
|
11 |
|
12 |
There are currently two well-known quantization methods.
|
13 |
-
(1)GPTQ(This model. 6.3 GB)
|
14 |
The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
|
15 |
At least one GPU is currently required due to a limitation of the Accelerate library.
|
16 |
So this model cannot be run with the huggingface space free version.
|
17 |
You need autoGPTQ library to use this model.
|
18 |
|
19 |
-
(2)gguf([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
|
20 |
You can use gguf model with llama.cpp at cpu only machine.
|
21 |
-
|
22 |
|
23 |
|
24 |
### sample code
|
|
|
10 |
This model is A quantized(miniaturized) version of the original model(21.42GB).
|
11 |
|
12 |
There are currently two well-known quantization methods.
|
13 |
+
(1)GPTQ model(This model. 6.3 GB)
|
14 |
The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
|
15 |
At least one GPU is currently required due to a limitation of the Accelerate library.
|
16 |
So this model cannot be run with the huggingface space free version.
|
17 |
You need autoGPTQ library to use this model.
|
18 |
|
19 |
+
(2)gguf model([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
|
20 |
You can use gguf model with llama.cpp at cpu only machine.
|
21 |
+
But maybe gguf model little bit slower then GPTQ especialy long text.
|
22 |
|
23 |
|
24 |
### sample code
|