Package Dependencies to Use This
What are the versions of transformers, accelerate, torch, fbgemm-gpu, etc. that are required to run this model? When I try to download it from the model hub I am running into the following error:
RuntimeError: Failed to import transformers.integrations.fbgemm_fp8 because of the following error (look up to see its traceback):
No module named 'fbgemm_gpu.experimental'
My package versions are as follows:
- transformers==4.43.1
- accelerate==0.33.0
- torch==2.3.1
- fbgemm-gpu==0.7.0+cu121, installed with
pip install fbgemm-gpu --index-url https://download.pytorch.org/whl/cu121/
Try installing the nightly version of fbgemm-gpu: pip install --pre fbgemm-gpu --index-url https://download.pytorch.org/whl/nightly/cu121/
I found this HuggingFace documentation which lead to this Pytorch documentation.
I followed the guide there and got the quantized model to work with the nightly fbgemm-gpu
build on an 8xH100-80GB machine.
Substantially slower inference than using the bitsandbytes
4-bit quantization of meta-llama/Meta-Llama-3.1-405B-Instruct
model, but it seems to work.
Thanks a lot for the help! I was able to load the model successfully if I used a fresh install of torch
and the nightly install of fbgemm-gpu
via pip install fbgemm-gpu-nightly
. In the end, however, I ended up going with the latest VLLM release from earlier today and the instructions found in their blog post.