Totally unusable from branch 4bit 32g (screenshots included)

#15
by anon7463435254 - opened

model settings.PNG

parameters.PNG

instruction template.PNG

All the responses are like the following:

chat.PNG

Why is that?

Thanks.

Hmm yeah you're right. AutoGPTQ is producing gibberish with this file.

In any case I would recommend you use ExLlama as the Loader, as it will be much faster than AutoGPTQ. And it works fine with this file, I just tested it.

But I need to investigate why AutoGPTQ cannot do inference from this file, and I will report that as a bug.

It's a bug in AutoGPTQ 0.3.0

If you really want to use AutoGPTQ for some reason, please downgrade to AutoGPTQ 0.2.2 and it will work - but it will be slow.

I will report this as a bug in AutoGPTQ but I don't know when it might be fixed

So, to summarise:

  1. I recommend you use ExLlama anyway, as it is faster
  2. If you really want to use AutoGPTQ, downgrade to 0.2.2
  3. I have raised this as a bug in 0.3.0, which you can track here: https://github.com/PanQiWei/AutoGPTQ/issues/201

Thank you very much, man. I also found a possible bug using the ggml files. Hoping to help, I'm gonna open a discussion on the 13B-chat-ggml.

anon7463435254 changed discussion status to closed

Sign up or log in to comment