Help! Hope to get an inference configuration that can run on multiple GPUs.

#25

by Lokis - opened Sep 7, 2024

Sep 7, 2024

I have an 8*A100 80Gb server, but I've been testing for a long time and can't use the multi-GPU configuration stably. When using Auto, the output is very slow. Are there any documents or open-source configuration files that I can learn from?

Ido-Amit198

Feb 3

Hey @Lokis and everyone else watching, check vllm for faster inference on multiple GPUs.
transformers implementation achieves low utilization from my experience, and thus its slow inference speed.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment