converting Phi

by vince62s - opened Jan 8

Jan 8

Hi,
With OpenNMT-py it's modular, like parallel_residual=True, Shared_LayerNorm=True, code base does not change.
So if I hack my current converters to convert Phi to OpenNMT-py weights, running in MoE mode should be straight forward since I already run Mixtral.

mlabonne

Owner Jan 8

Hi, I'm not familiar with OpenNMT-py but this sounds great. Happy to see the results if you manage to run it!

vince62s

Jan 8

for some reason the merge renamed the ff layers from fc1/fc2 to w1/w2 to take the mixtral naming convention.
I'll look further tomorrow but IMO it would be better to keep the Phi namings.

mlabonne

Owner Jan 8

Oops yeah, I renamed them. Is fc1/fc2 => w1/w2 the only issue with the names? I can change it if it makes it easier for you.

vince62s

Jan 8

ideally if you want to make it work with HF and slight changes in modeling_phi.py you may also rename block_sparse_moe => moe and experts => mlp
then we just need to add a class MoE(nn.Module) in modeling_phi.py

mlabonne

Owner Jan 8

Cool! Can you confirm that the following is correct?

moe_tensor_name = tensor_name.replace("mlp.fc1.bias", f"moe.mlp.{moe_index}.fc1.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc1.weight", f"moe.mlp.{moe_index}.fc1.weight")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.bias", f"moe.mlp.{moe_index}.fc2.bias")
moe_tensor_name = moe_tensor_name.replace("mlp.fc2.weight", f"moe.mlp.{moe_index}.fc2.weight")

vince62s

Jan 9

I think I got it working. I patched model_phi.py with the wrong names so if you fix the tensors names I'll push it with the right names.

agokrani

Jan 13

•

edited Jan 13

Hi @vince62s ,

I assume this is not working with huggingface weights of Phi2. Is it possible to support that?

vince62s

Jan 13

Not sure what your question is but I made it work with HF, look at the model card.

agokrani

Jan 13

So there are two implementations of phi2 one by Microsoft which requires trust_remote_code = True. There is another implementation which is actually in transformers official repo. The weights are available in this repo: susnato/phi-2

So I was wondering if it would possible to support this as well. I think its more or less copying MOE class and calling it in correct places with correct dimensions.

vince62s

Jan 13

this would require HF to accept a PR on modeling_phi.py in the official transformers repo, which I don't think is possbile at the moment. so best is to use this repo for now.

vince62s changed discussion status to closed Jan 15

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment