Loading model in 8bit
#37
by
abhi24
- opened
Does loading the model in 8 bit always lead to a poor quality of performance? at least poorer compared to original model?
Can someone describe in brief what happens when we load it 8 bit?
Instead of working with 16-bit floating-point numbers for weights, you work with 8-bit integers. These have much smaller range and precision, so the math is less accurate where done in 8-bit. It's not necessarily faster either. But it takes half the memory. It doesn't necessarily make the result much worse; some have experimented even with 4-bit math. For example, the Dolly 12B model works on an A10 in 8-bit and the results seem pretty fine to me.
Thank you for the insightful reply.
abhi24
changed discussion status to
closed