Could we get an fp16 version? This thing is huge...

by YokaiKoibito - opened Aug 10, 2023

Aug 10, 2023

•

edited Aug 10, 2023

I appreciate having this 32-bit version for anyone who wants to do further training, but I'm never going to run this in 32-bit for inference, so downloading a 32-bit version then downsizing it to 4/8/16 locally is a huge waste of time/bandwidth.

YokaiKoibito

Aug 10, 2023

Your model card says this is fp16. But it's 29 shards of around 9.3GB, so around 4 bytes per parameter, so it's clearly actually fp32.

YokaiKoibito

Aug 12, 2023

I made an fp16 copy at YokaiKoibito/llama2_70b_chat_uncensored by importing to CPU as torch.float16 and then rexporting. It is indeed half the size. If you make an fp16 copy I can take mine down.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment