Model Continuously Generating Text After Completing Task
#6
by
endrikacupaj
- opened
Hi,
Thank you for the great work and for open-sourcing the models!
I’ve noticed that sometimes the model keeps generating text indefinitely, even after it has answered the question or completed the task. It seems to forget to add an EOS token to stop the generation, leading to unnecessary token usage and making it harder to use in real applications.
I’m currently running the model with vLLM, and to prevent this issue, I have to set the max_tokens argument and post-process the responses.
Have you encountered this behavior before? Do you know why it happens, and is there a way to fix it?
Best,
Endri
Hi Endri, thanks for opening this issue. Can you provide any examples and/or specifics about this? Things that would help narrow down the behavior:
- Sample prompts that you see cause this behavior
- Sampling parameters (temperature, top_k, etc...)
- Description of whether the behavior is deterministic or random
- Characteristics of the load in vLLM that coincides with the behavior