Can this model work with vLLM?

by KimChen - opened 7 days ago

Discussion

KimChen

7 days ago

As title, how can I serve this model with vLLM?

junliu-mde

6 days ago

I think not. Neither SGL or VLLM support Blackwell for now.

johnnynunez

1 day ago

•

edited 1 day ago

vllm now support blackwell, but is not mandatory blackwell, you can run it in old devices without acceleration.
It's like transformer engine, you can run in in fp32 in turing, fp16 in ampere, fp8 in ada and blackwell and fp4 in blackwell

junliu-mde

about 20 hours ago

•

edited about 20 hours ago

vllm now support blackwell, but is not mandatory blackwell, you can run it in old devices without acceleration.
It's like transformer engine, you can run in in fp32 in turing, fp16 in ampere, fp8 in ada and blackwell and fp4 in blackwell

Oh really? I tried to build vllm in nvcr.io/nvidia/pytorch:25.02-py3 with b200 but not successful since it always downgrading torch version even if I built on update-torch-2.6.0 branch.
Could you kindly show some reference about how to do that?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment