Using transformers library to download the files
Can the quantized models be downloaded using the transformers library? Tried the code as follows and it returned OSError: TheBloke/vicuna-13b-1.1-GGML does not appear to have a file named config.json.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TheBloke/vicuna-13b-1.1-GGML")
model = AutoModelForCausalLM.from_pretrained("TheBloke/vicuna-13b-1.1-GGML")
How do we specify which quantized model (i.e 4bit) to be used?
Thanks!
No, transformers can't handle GGML files in any way.
But ctransformers can, including downloading and loading individual GGML files: https://github.com/marella/ctransformers
It can provide an OpenAI compatible API so yeah that should have a chat API mode. Not tried it myself
I have downloaded llama-2-7b-chat.ggmlv3.q4_0.bin and when I am trying to load the model using ctransformers I am getting error as GLIBC_2.29 compatibility and I am using RHEL which support GLIBC_2.17. Can I use any other model instead of GGML?
PS: I don't have much GPU