Should be working properly with KoboldCPP and other stuff. I changed EOS token to <|eot_id|> and this prevents endless generation. I also converted original weights back to fp32 before quantization. **Edit: Highly recommending you to download more recent quants.** 17/05/2024