dahara1
/

weblab-10b-instruction-sft-GPTQ

Text Generation

text-generation-inference

Model card Files Files and versions Community

dahara1 commited on Aug 23, 2023

Commit

1f3a8cc

•

1 Parent(s): c1b63f5

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -9,14 +9,15 @@ original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/we
 This model is A quantized(miniaturized) version of the original model(21.42GB).
-There are currently two well-known quantization methods.
-(1)GPTQ model(This model. 6.3 GB)
 The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
 At least one GPU is currently required due to a limitation of the Accelerate library.
 So this model cannot be run with the huggingface space free version.
 You need autoGPTQ library to use this model.
-(2)gguf model([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
 You can use gguf model with llama.cpp at cpu only machine.
 But maybe gguf model little bit slower then GPTQ especialy long text.

 This model is A quantized(miniaturized) version of the original model(21.42GB).
+There are currently two well-known quantization version of original model.
+(1)GPTQ version(This model. 6.3 GB)
 The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
 At least one GPU is currently required due to a limitation of the Accelerate library.
 So this model cannot be run with the huggingface space free version.
 You need autoGPTQ library to use this model.
+(2)llama.cpp version(gguf)([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB)
+created by mmnga.
 You can use gguf model with llama.cpp at cpu only machine.
 But maybe gguf model little bit slower then GPTQ especialy long text.