How did you quantize the model?

#2
by AgeOfAlgorithms - opened

Hi, I'm new to quantizing, and I've been trying to research ways to quantize xcodec2 model.
Do you mind sharing how you made a bf16 quantization? is it possible to create a quantization of 8-bit precision as well?

There isn't really much to this one. Simply load the model, cast the weights and save it again. You can try float8_e4m3fn or float8_e5m2 but I have a feeling it won't work too well.

For safetensors:

import torch
from safetensors.torch import load_file, save_file

model = load_file("model.safetensors")

for k in model:
    if model[k].dtype == torch.float32:
        model[k] = model[k].to(torch.bfloat16)

save_file(model, "model.bf16.safetensors")

For pt you might need to add some metadata:

...
model = torch.load("model.pt")
...
save_file(model, "model.bf16.safetensors", metadata={"format": "pt"})

Sign up or log in to comment