converting Phi

#1
by vince62s - opened

Hi,
With OpenNMT-py it's modular, like parallel_residual=True, Shared_LayerNorm=True, code base does not change.
So if I hack my current converters to convert Phi to OpenNMT-py weights, running in MoE mode should be straight forward since I already run Mixtral.

Owner

Hi, I'm not familiar with OpenNMT-py but this sounds great. Happy to see the results if you manage to run it!

for some reason the merge renamed the ff layers from fc1/fc2 to w1/w2 to take the mixtral naming convention.
I'll look further tomorrow but IMO it would be better to keep the Phi namings.

Owner

Oops yeah, I renamed them. Is fc1/fc2 => w1/w2 the only issue with the names? I can change it if it makes it easier for you.

ideally if you want to make it work with HF and slight changes in modeling_phi.py you may also rename block_sparse_moe => moe and experts => mlp
then we just need to add a class MoE(nn.Module) in modeling_phi.py

Owner

Cool! Can you confirm that the following is correct?

moe_tensor_name = tensor_name.replace("mlp.fc1.bias", f"moe.mlp.{moe_index}.fc1.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc1.weight", f"moe.mlp.{moe_index}.fc1.weight")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.bias", f"moe.mlp.{moe_index}.fc2.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.weight", f"moe.mlp.{moe_index}.fc2.weight")

I think I got it working. I patched model_phi.py with the wrong names so if you fix the tensors names I'll push it with the right names.

Hi @vince62s ,

I assume this is not working with huggingface weights of Phi2. Is it possible to support that?

Not sure what your question is but I made it work with HF, look at the model card.

So there are two implementations of phi2 one by Microsoft which requires trust_remote_code = True. There is another implementation which is actually in transformers official repo. The weights are available in this repo: susnato/phi-2

So I was wondering if it would possible to support this as well. I think its more or less copying MOE class and calling it in correct places with correct dimensions.

this would require HF to accept a PR on modeling_phi.py in the official transformers repo, which I don't think is possbile at the moment. so best is to use this repo for now.

vince62s changed discussion status to closed

Sign up or log in to comment