LoRA Finetuning - Text vs Vision Effects
#54
by
brecker
- opened
2 questions:
- Are you able to finetune only the textual aspect of Phi3V without finetuning the vision component?
-Say I want to 'retain' (I assume they will degrade with finetuning) the models vision capabilities while finetuning it for function calling etc. The multi-modality of the model is important to me - Which are the best target modules to target when performing LoRA FT on Phi3V?
- Yes you could.
- Actually when using LoRA the official code shows you to fine tune only the language _model part.
- I'm not exactly sure, so I'm just performing all of the layers except for the "lm_head".
@brecker
https://github.com/microsoft/Phi-3CookBook/blob/20d56d79cfd38eb175118ecc961a9b49e2341de2/code/04.Finetuning/vision_finetuning/finetune_hf_trainer_hateful_memes.py#L374-L384
here's the link for you.
However the img_projection layer is in the vision_model part so it freeze also.
If you need to controll all three part to freeze/unfreeze , you can use my code.
https://github.com/2U1/Phi3-Vision-ft