Spaces:
Running
on
CPU Upgrade
Model Memory Consumption of Llama-2 models, access granted
I have access to gated model, for instance meta-llama/Llama-2-7b, I want to check its model memory consumption. I got the access through the link, by email. How can I get an API token to enter it for model memory tool? I cannot find option to extract it.
Hi! You can get the API token here. However, you won't be able to get the memory consumption of meta-llama/Llama-2-7b
because it is not a model compatible with transformers (no config.json file). Instead, you can check it the memory consumption using this model meta-llama/Llama-2-7b-hf which is works with transformers library.
Thank you for the clarification. I just tested for meta-llama/Llama-2-7b-hf, I am still getting an error. I tried with read token and with write token, I am still getting an error for your suggested Llama-2 model.
Indeed. There is an issue with the deployment of the space.
@muellerzr
is looking into. In the meantime, you can try this tool directly by downloading accelerate from source : pip install git+https://github.com/huggingface/accelerate.git
and use the following command on the CLI : accelerate estimate-memory meta-llama/Llama-2-7b-hf
. Checkout the doc also.
Thank you
This should be fixed now, since the original weights aren't compatible with transformers, do know that pointing to meta-llama/Llama-2-x
will check meta-llama/Llama-2-x-hf
since that is compatible :)
Thank you. I do not know what I am doing wrong, but still I am not able to get it work with meta-llama/Llama-2-7b-hf. How to specify/ generate access token? Should I write model name at specifying it: meta-llama/Llama-2-7b-hf?
It's your personal access token, from here: huggingface.co/settings/tokens
Hi all this will be solved now thanks to this PR in Accelerate: https://github.com/huggingface/accelerate/pull/2327
Once merged I'll factory reset the space