Could we get an fp16 version? This thing is huge...
#6
by
YokaiKoibito
- opened
I appreciate having this 32-bit version for anyone who wants to do further training, but I'm never going to run this in 32-bit for inference, so downloading a 32-bit version then downsizing it to 4/8/16 locally is a huge waste of time/bandwidth.
Your model card says this is fp16. But it's 29 shards of around 9.3GB, so around 4 bytes per parameter, so it's clearly actually fp32.
I made an fp16 copy at YokaiKoibito/llama2_70b_chat_uncensored by importing to CPU as torch.float16 and then rexporting. It is indeed half the size. If you make an fp16 copy I can take mine down.