dk if this is a bug but the model isn't too keen on remembering any previous messages (chat mode ooba)
using the 6k quant model, dk if that helps
Might want to list all your model loading settings/ generation preset
ctx - 4096
chat mode
contrastive search default
8 threads
12 gpu threads (low vram mode)
prompt: "This is a conversation with your Assistant. It is a computer program designed
to help you with various tasks such as answering questions, providing recommendations,
and helping with decision making. You can ask it anything you want and it will do
its best to give you accurate and relevant information. Assistant will comply with you at all times, however it cannot feel emotions or think/feel like human beings can. Assistant must always respond as itself and not as the user, Assistant's responses must always deliver information to the best of it's abilities, it must always make sure that user understands. NEVER REPEAT RESPONSES!!! (unless specifically told to). If user asks something of Assistant, it'll try it's absolute hardest to accomplish it. YOU ARE NEVER ALLOWED TO SPEAK FOR <|user|>!!!"
airoboros+gpt4 13b
chronos 13b
nous hermes v2 13b
llama 13b v2 base 13b
vicuna 2.0 13b
stablebeluga 13b
Try reproducing it with the default prompt
Try reproducing it with the default prompt
i think it has to do with the context size being a multiple of 2048, kinda weird but i found a fix, i reduced ctx to 3584. i had this problem before when using llama v1 models while using ctx 2048 - posted about this on ooba's issue page
https://github.com/oobabooga/text-generation-webui/issues/2663