LoneStriker/Liberated-Qwen1.5-72B-6.0bpw-h6-exl2

Apr 1

Is there any rhyme or reason that out of all the LoneStriker exl2 models I have downloaded from 70B - 120B this is the only one that refuses to let me span across multiple GPUs? Running 3 A40s and I have split countless models across them and this one refuses to split. Fills GPU 0 and then Out of Memory..

LoneStriker

Owner Apr 2

You generally have to set the first GPU to use much less VRAM than the others to reserve room for the context length. The Qwen models have a much bigger vocab size, so they will be significantly more memory than a LLaMA or Mistral-type model. You can also drop the context size lower, try 2k for example, to get it to load and test the model.

BigHuggyD

Apr 3

Okay I will try that.
I usually start with 2k context anyway, just to see how much VRAM it will require, but I think I had it set to 45 of 48 on GPU 0 I'll try lower.

BigHuggyD

Apr 3

I guess I am out of my depth here. I can't get it to even attempt to span across GPUs, no matter what I set GPU0 to. It just keeps filling GPU0 up to 100% and doesn't split. First time I have had this issue. I guess it's because of what you said. It's not LLaMA or Mistral-based which I have had no issues loading dozens of different 70-120B models. Ahh well I appreciate you trying to help but it gets expensive trying to troubleshoot one model when renting GPUs when so many others work fine.
Thanks for trying!

LoneStriker
/

Liberated-Qwen1.5-72B-6.0bpw-h6-exl2

How come I can't