Can this model work with vLLM?
As title, how can I serve this model with vLLM?
I think not. Neither SGL or VLLM support Blackwell for now.
vllm now support blackwell, but is not mandatory blackwell, you can run it in old devices without acceleration.
It's like transformer engine, you can run in in fp32 in turing, fp16 in ampere, fp8 in ada and blackwell and fp4 in blackwell
vllm now support blackwell, but is not mandatory blackwell, you can run it in old devices without acceleration.
It's like transformer engine, you can run in in fp32 in turing, fp16 in ampere, fp8 in ada and blackwell and fp4 in blackwell
Oh really? I tried to build vllm in nvcr.io/nvidia/pytorch:25.02-py3 with b200 but not successful since it always downgrading torch version even if I built on update-torch-2.6.0
branch.
Could you kindly show some reference about how to do that?