--- pipeline_tag: conversational tags: - vicuna - llama - text-generation-inference --- Converted for use with [llama.cpp](https://github.com/ggerganov/llama.cpp) --- - 4-bit quantized - Needs ~10GB of CPU RAM - Won't work with alpaca.cpp or old llama.cpp (new ggml format requires latest llama.cpp) - EOS token fix added (download rev1) --- If you only have 8GB RAM, a smaller 7B version of this can be found here: https://huggingface.co/eachadea/ggml-vicuna-7b-4bit. 7B is over 2x faster and is also uncensored, while 13B isn't. --- tags: - vicuna - llama - text-generation-inference ---