Number of tokens (525) exceeded maximum context length (512).
#7
by
ashubi
- opened
I'm chatting with documents using TheBloke/Llama-2-7B-GGML, but when I ask a question, it says, "Number of tokens (525) exceeded maximum context length (512)." and this number keeps going up—526, 527, etc.—and eventually it will respond in an unstructured manner. I am running this model in CPU
Note: The response is good if the warning "Number of tokens (525) exceeded maximum context length (512)" is not generated for any query.