ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float8_e4m3fn

#82
by ajtakto - opened

How do you deal with the fact, that different layers in ds are in different data types? I try to run the model on gpus with 60GB and need to use FSDP.

Sign up or log in to comment