OOM with int4 quant

#8
by chungimungi - opened

OOM with 32GB vram on int4 quant, considering this is only a 16B para model it should run on the GPU. I can easily run gemma-2 27B on int4 with the same specs.

Sign up or log in to comment