Using transformers library to download the files

by nicoleds - opened Jun 22, 2023

Jun 22, 2023

Can the quantized models be downloaded using the transformers library? Tried the code as follows and it returned OSError: TheBloke/vicuna-13b-1.1-GGML does not appear to have a file named config.json.

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TheBloke/vicuna-13b-1.1-GGML")
model = AutoModelForCausalLM.from_pretrained("TheBloke/vicuna-13b-1.1-GGML")

How do we specify which quantized model (i.e 4bit) to be used?

Thanks!

TheBloke

Owner Jun 22, 2023

No, transformers can't handle GGML files in any way.

But ctransformers can, including downloading and loading individual GGML files: https://github.com/marella/ctransformers

ConsSSJUI

Aug 2, 2023

That's very useful information! @TheBloke may I ask if it happens to know if ctransformers support chat completion? When I use it it just autocompletes sentences.

TheBloke

Owner Aug 5, 2023

It can provide an OpenAI compatible API so yeah that should have a chat API mode. Not tried it myself

farooq9786

Nov 8, 2023

I have downloaded llama-2-7b-chat.ggmlv3.q4_0.bin and when I am trying to load the model using ctransformers I am getting error as GLIBC_2.29 compatibility and I am using RHEL which support GLIBC_2.17. Can I use any other model instead of GGML?

PS: I don't have much GPU

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment