想请教一下加载glm需要多少cuda memory

#11

by FearandDreams - opened Jun 6, 2024

Discussion

FearandDreams

Jun 6, 2024

如题，自己在用A5000有24g显存，确定无占用的情况下在“model = AutoModelForCausalLM.from_pretrained(......)......”这里遇到CUDA out of memory

tungloong

Jun 6, 2024

reference：https://github.com/THUDM/GLM-4/blob/main/basic_demo/README.md#glm-4v-9b
the model requires at least 28GB of VRAM when using bf16.

btw, the model name "glm-4v-9b" may be misleading, as it suggests that the model has only 9B parameters(but actually 13B refer to https://github.com/THUDM/CogVLM2?tab=readme-ov-file#recent-updates)

FearandDreams

Jun 6, 2024

Thanks!

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Jun 6, 2024

需要使用28G显存，您可以使用load in 4bit加载哦

FearandDreams

Jun 6, 2024

请问怎么用4bit加载？需要修改哪里呢？

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Jun 6, 2024

model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
low_cpu_mem_usage=True
).eval()

FearandDreams

Jun 6, 2024

感谢！但之后有遇到报错：return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same

zhaoyang0618

Jun 6, 2024

如果是单机多卡，代码需要如何修改？我们的设备上有两张卡，每张卡22GiB，跑的时候也是报错：Out of Memory，发现它只使用了一张卡

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Jun 8, 2024

最新的代码已经支持了，auto自动分布

zRzRzRzRzRzRzR changed discussion status to closed Jun 8, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment