Undi95
/

Llamix2-MLewd-4x13B-GGUF

GGUF

Not-For-All-Audiences

nsfw

Inference Endpoints

Model card Files Files and versions Community

slow prompt processing

by Erobb - opened Dec 15, 2023

Discussion

Erobb

Dec 15, 2023

Model seems great, but it takes forever to process the prompt (much longer than to generate the response). Is it an issue on my end or is it a problem with the model? I'm using the q4 quant.

Undi95

Owner Dec 15, 2023

Seems fine on my side, the prompt processing first is slow, then each new generation after context is done (on koboldcpp) is very fast.
Be sure to use correct setting (CuBLAS)

Erobb

Dec 15, 2023

Oh, so it is the model. I was comparing it to your other model and it was so slow in comparison that I thought it was bugged. It's fine for low context but once you 4k+ and change anything it takes like 5 minutes to get a response. It gets tiring quickly if you like to edit stuff like me.

Erobb

Dec 16, 2023

So I found ooba to be much faster than koboldcpp for this, around x3. Still longer than the usual time for other models, but its usable now.

Erobb changed discussion status to closed Dec 24, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment