Script?
Could you please potentially share with me the edited script? I’m having trouble MOE’ing phi.
This is what I got.
beware this will only work with 2 phis, you might have to tinker in the naming thing for more layers
- Modify moe_mixtral.py from
/content/mergekit/mergekit/scripts/mixtral_moe.py
from here
- Modify architecture.py
/content/mergekit/mergekit/architecture.py
(this you can take from the link to the commit i have in description or from below )
after all the file modifications and run, you need to replace config.json
with the one from this repo
after that you need to add modeling_phi.py
and configurations.phi
from this repo to your repo
Thank you so much you are the 🐐! I Really appreciate it! Thank you!
@Vezora
did you succeed? I am having trouble modifying mixtral_moe.py. What needs to be changed there?
@paulilioaica
Should i use the mixtal_moe.py from here https://github.com/paulilioaica/Phi-MOE/blob/main/mixtral_moe.py
@vhug just go ahead and replace the whole file with the modified one
@paulilioaica but in your mixtral_moe.py you are using PHI2_INFO which leads to key error while using phi2 model because of layer names like "transformer.embd.wte.weight" while the phi2 model layers go by names like "model.embed_tokens.weight".
Is your mixtral_moe.py for phi2 or for phi1?
I ran this MergeKitNotebook with the 2 files overwritten and it works fine for me. Are you sure you replaced both architecture.py and the mixtral_moe.py ?
This is for Phi2
@paulilioaica your script actually works for the models in your merge-config.yaml. I replaced the 2 files and ran the mixtral_moe.py and it works just fine. But I am now confused, I checked the model layer names of "phi-2-orange" and the model layers from "phi-2" from microsoft and they differ. Why is that? Attached screenshot below where the left side is the print statement for phi-2 model from microsoft and on the right side is the print statement of the model layers for phi-2-orange.
Looks like they merged the official Phi2 code into the Transformer library and deleted the custom code
https://huggingface.co/microsoft/phi-2/commit/da135b7268f02aaa1f591fdd29b6be896008c798
@paulilioaica what does that mean?