Package Dependencies to Use This

by adevaraj - opened Jul 23, 2024

Jul 23, 2024

What are the versions of transformers, accelerate, torch, fbgemm-gpu, etc. that are required to run this model? When I try to download it from the model hub I am running into the following error:

RuntimeError: Failed to import transformers.integrations.fbgemm_fp8 because of the following error (look up to see its traceback):
No module named 'fbgemm_gpu.experimental'

My package versions are as follows:

transformers==4.43.1
accelerate==0.33.0
torch==2.3.1
fbgemm-gpu==0.7.0+cu121, installed with pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cu121/

lbux

Jul 23, 2024

•

edited Jul 23, 2024

Try installing the nightly version of fbgemm-gpu: pip install --pre fbgemm-gpu --index-url https://download.pytorch.org/whl/nightly/cu121/

campbellim

Jul 23, 2024

•

edited Jul 24, 2024

I found this HuggingFace documentation which lead to this Pytorch documentation.

I followed the guide there and got the quantized model to work with the nightly fbgemm-gpu build on an 8xH100-80GB machine.

Substantially slower inference than using the bitsandbytes4-bit quantization of meta-llama/Meta-Llama-3.1-405B-Instruct model, but it seems to work.

adevaraj

Jul 24, 2024

Thanks a lot for the help! I was able to load the model successfully if I used a fresh install of torch and the nightly install of fbgemm-gpu via pip install fbgemm-gpu-nightly. In the end, however, I ended up going with the latest VLLM release from earlier today and the instructions found in their blog post.

adevaraj changed discussion status to closed Jul 24, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment