Very long responses
What can cause infinitely long responses? I'm using koboldcpp 1.56/oobabooga (commit 0f134bf) + SillyTavern 1.11.4 with settings from the model card.
I've tried this model with several characters and everyone writes a whole sheet of text (2000+ tokens) after one of my little lines. I feel like I could have pressed "Continue" indefinitely and it still wouldn't have ended. I'm new to LLM and have only used Kunoichi-DPO-v2-7B and Kuro-Lotus-10.7B before, but they always managed 100-250 tokens in their responses, only rarely did I click "Continue".
I like the writing style of this model, but the overly long answers make it impossible to roleplay... What am I doing wrong?
This is Kuro-Lotus-10.7B-Q6_K.gguf with the same oobabooga (commit 0f134bf) and with exactly the same settings.
What program do you use to communicate with the AI?
@HyperN0va KoboldCPP, for use with GGUF files.