4bpw

by Nephilim - opened Feb 20

Feb 20

•

Hi, it's possible for you to make a exactly 4bpw quant?

Owner Feb 20

Sure, what are you trying to fit it on that 4bpw would fit better?

Feb 20

rtx 4060 ti 16gb
I'm currently using a 4bpw version of the base Buttercup model and it fits perfectly on my card with the max context(32k)

Owner Feb 20

Hmm that seems surprising, from my math 32k context with 4bpw should take ~16.7 GB, but i'll make it and check if i'm calculating wrong

Owner Feb 20

let me know if it works and what your final usage looks like, if it makes more sense for a 16gb card i'll add it for future quants of this size

Feb 20

Oh, thanks, I will test it

Feb 20

Worked very well here, thanks again.

Owner Feb 20

@Nephilim Are you sure that you're not overflowing onto system RAM? When loading the 4bpw model with 32k context I hit 16.8GB usage

Feb 20

100% sure, i've disabled system fallback, it runs on ~10 tokens/it here

Owner Feb 20

fascinating... what's your setup, i wonder if TGWUI adds some overhead that i don't realize

Feb 20

i'm using the latest version of oobabooga, with the 8bit cache option enabled

Owner Feb 20

ahhhhh 8 bit cache explains it!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment