Create README.md
Browse filesThis model based llama-2 chat-hf and fine-tuned via qlora (peft). Then quantize via llama.cpp to q4k. There is no viable performance loss.
This model based llama-2 chat-hf and fine-tuned via qlora (peft). Then quantize via llama.cpp to q4k. There is no viable performance loss.