Error Loading EXAONE3.5-32B Model with vLLM
#3
by
unpieceof
- opened
I'm encountering an error while attempting to load the EXAONE3.5-32B model using vLLM on GPU. The issue manifests differently depending on the GPU configuration:
- vllm : 0.6.3
- gpu : H100
When using parallel loading across 2 GPUs:
- The loading process completes all 27/27 safetensors
- Error occurs after full tensor loading
When using a single GPU:
- The loading process stops at 20/27 safetensors
- Error occurs during tensor loading
The error message received in both cases is:
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
Could anyone help identify what might be causing this semaphore leak and how to resolve it? Any assistance would be greatly appreciated.
Let me know if you'd like me to modify anything in this post.
Hi,
@unpieceof
!
Sorry for the late answer. Can you try dtype=bfloat16
?
Would you give us more information for testing? (e.g. scripts using from vllm import LLM
or vllm serve
)