Package Dependencies to Use This

#6
by adevaraj - opened

What are the versions of transformers, accelerate, torch, fbgemm-gpu, etc. that are required to run this model? When I try to download it from the model hub I am running into the following error:

RuntimeError: Failed to import transformers.integrations.fbgemm_fp8 because of the following error (look up to see its traceback):
No module named 'fbgemm_gpu.experimental'

My package versions are as follows:

  • transformers==4.43.1
  • accelerate==0.33.0
  • torch==2.3.1
  • fbgemm-gpu==0.7.0+cu121, installed with pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cu121/

Try installing the nightly version of fbgemm-gpu: pip install --pre fbgemm-gpu --index-url https://download.pytorch.org/whl/nightly/cu121/

I found this HuggingFace documentation which lead to this Pytorch documentation.

I followed the guide there and got the quantized model to work with the nightly fbgemm-gpu build on an 8xH100-80GB machine.

Substantially slower inference than using the bitsandbytes4-bit quantization of meta-llama/Meta-Llama-3.1-405B-Instruct model, but it seems to work.

Thanks a lot for the help! I was able to load the model successfully if I used a fresh install of torch and the nightly install of fbgemm-gpu via pip install fbgemm-gpu-nightly. In the end, however, I ended up going with the latest VLLM release from earlier today and the instructions found in their blog post.

adevaraj changed discussion status to closed

Sign up or log in to comment