Will this fit on a single 24GB Video card (4090)?

#2
by clevnumb - opened

If so, at what context can it support? Thank you.

Unfortunately not, the model file sizes add up to more than 24 GB VRAM, so it won't fit without overflowing the memory. You can run a 70B model at 2.4bpw with a few thousand tokens of context (depending on what other programs are using your GPU VRAM.) A 120B model needs more than 24 GB, so you need a second GPU of some sort with 8+ GB VRAM.

Sign up or log in to comment