Spaces:
Running
HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be..
I use the meta-llama/Meta-Llama-3-70B-Instruct model. After a certain number of moves, the AI refuses to walk and gives an error : "Input validation error: inputs
tokens + max_new_tokens
must be <= 8192. Given: 6391 inputs
tokens and 2047 max_new_tokens
". Is this a bug or some new limitation? I still don't get it to be honest and I hope I get an answer here. I'm new to this site.
Same issue all of the sudden today
Can you see if this still happens? Should be fixed now.
I keep getting this error as well. Using CohereForAI
Same error, "Meta-Llama-3-70B-Instruct" model.
I have also been running into this error. Is there a workaround or solution at all?
"Input validation error: inputs
tokens + max_new_tokens
must be <= 8192. Given: 6474 inputs
tokens and 2047 max_new_tokens
"
Using the meta-llama/Meta-Llama-3-70B-Instruct model.
Hi, I saw the above thread and was wondering if its an issue or limitation.
I am using meta-llama/Meta-Llama-3.1-70B-Instruct which has a context window of 128k. But I get this when I send large input.
Input validation error: inputs
tokens + max_new_tokens
must be <= 8192. Given: 12682 inputs
tokens and 4000 max_new_tokens
Using Hugging Chat, https://huggingface.co/chat/
Model: meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
Input validation error: inputs
tokens + max_new_tokens
must be <= 16384. Given: 14337 inputs
tokens and 2048 max_new_tokens
Hello,
I have exactly the same error when calling Meta-Llama-3.1-70B-Instruct using Haystack v2.0's HuggingFaceTGIGenerator in the context of a RAG application:
It is very puzzling because Meta-Llama-3.1-70B-Instruct should have a context window size of 128k tokens. This, and the multilingual capabilities, are major upgrades with respect to the previous iteration of the model.
Still, here's the result:
I am calling the model using serverless API. Perhaps creating a dedicated, paid API endpoint would solve the issue? Did anyone try this?
Hi,
I had the same problem when using the Serverless Inference API and meta-llama/Meta-Llama-3.1-8B-Instruct. The problem is that the API only supports a context length of 8k for this model, while the model supports 128k. I got around the problem by running a private endpoint and changing the 'Container Configuration', specifically the token settings to whatever length I required.
Hi AlbinLidback,
Yes, I ended up doing the same thing and it solved the problem. HuggingFace could save users a lots of frustration by explicitly mentioning this on the model cards.
Hi @AlbinLidback , @JulienGuy
I'm totally new to the Hugging Face.
I also got the same problem with meta-llama/Meta-Llama-3.1-8B-Instruct and 70B-Instruct.
Could you share hot to "running a private endpoint and changing the 'Container Configuration' with the 128k token length?
Hi @pineapple96 ,
This part is relatively straightforward. Go to the the model card (e.g. https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), Click on "Deploy" in the top right corner and select "Inference Endpoint". In the next page you can choose what hardware you want to run the model on, which will impact how much you will pay per hour. Set "Automatic Scale to Zero" to some value other than "never" to switch off the endpoint after X amount of time without request, so that you won't be paying for the endpoint while it's not in use. Then go to "Advanced Configuration" and set the maximum amount of tokens to whatever makes sense for your use case. With this procedure you will be able to make full use of the larger context windows of Llama 3 models.
Thanks a lot for the detailed how-to guide, JulienGuy. Appreciate it!