Number of tokens (525) exceeded maximum context length (512).

by ashubi - opened Dec 30, 2023

Dec 30, 2023

•

edited Dec 30, 2023

I'm chatting with documents using TheBloke/Llama-2-7B-GGML, but when I ask a question, it says, "Number of tokens (525) exceeded maximum context length (512)." and this number keeps going up—526, 527, etc.—and eventually it will respond in an unstructured manner. I am running this model in CPU
Note: The response is good if the warning "Number of tokens (525) exceeded maximum context length (512)" is not generated for any query.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment