dahara1
/

weblab-10b-instruction-sft-GPTQ

Text Generation

text-generation-inference

Model card Files Files and versions Community

dahara1 commited on Aug 23, 2023

Commit

c1b63f5

•

1 Parent(s): 3105ad4

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -10,15 +10,15 @@ original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/we
 This model is A quantized(miniaturized) version of the original model(21.42GB).
 There are currently two well-known quantization methods.
-(1)GPTQ(This model. 6.3 GB)
 The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
 At least one GPU is currently required due to a limitation of the Accelerate library.
 So this model cannot be run with the huggingface space free version.
 You need autoGPTQ library to use this model.
-(2)gguf([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
 You can use gguf model with llama.cpp at cpu only machine.
-but maybe little bit slower then GPTQ especialy long text.
 ### sample code

 This model is A quantized(miniaturized) version of the original model(21.42GB).
 There are currently two well-known quantization methods.
+(1)GPTQ model(This model. 6.3 GB)
 The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
 At least one GPU is currently required due to a limitation of the Accelerate library.
 So this model cannot be run with the huggingface space free version.
 You need autoGPTQ library to use this model.
+(2)gguf model([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
 You can use gguf model with llama.cpp at cpu only machine.
+But maybe gguf model little bit slower then GPTQ especialy long text.
 ### sample code