Llama 3 8B Instruct - Q8 vs FP16 vs FP32

by Hearcharted - opened

Llama 3 8B Instruct - Q8 vs FP16 vs FP32

Hi Bartowski, do you know if there is too much difference in Response Quality between them?
Many thanks in advance...

There shouldn't be that much difference, but any quantization will affect output to some level..

If you can comfortably run f16, you should

Thank you so much for your time, gentleman 🎩

Sign up or log in to comment