想请教一下加载glm需要多少cuda memory
如题,自己在用A5000有24g显存,确定无占用的情况下在“model = AutoModelForCausalLM.from_pretrained(......)......”这里遇到CUDA out of memory
reference:https://github.com/THUDM/GLM-4/blob/main/basic_demo/README.md#glm-4v-9b
the model requires at least 28GB of VRAM when using bf16.
btw, the model name "glm-4v-9b" may be misleading, as it suggests that the model has only 9B parameters(but actually 13B refer to https://github.com/THUDM/CogVLM2?tab=readme-ov-file#recent-updates)
Thanks!
需要使用28G显存,您可以使用load in 4bit加载哦
请问怎么用4bit加载?需要修改哪里呢?
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
low_cpu_mem_usage=True
).eval()
感谢! 但之后有遇到报错:return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same
如果是单机多卡,代码需要如何修改?我们的设备上有两张卡,每张卡22GiB,跑的时候也是报错:Out of Memory,发现它只使用了一张卡
最新的代码已经支持了,auto自动分布