Finetuning script using HuggingFace (No llama-factory)

#32
by 2U1 - opened

https://github.com/2U1/Qwen2-VL-Finetune

I made a code for who wants to use the huggingface version to finetune, and having difficult using some other frameworks like me.

This code only uses huggingface for fine-tuning the 7B and 2B model.

Also, you can set different learning_rate for vision_model and language_model. ( Also for the merger)

Feedback and issues are welcome!

Thanks for sharing it! Any video demo with this fine-tuning codebase?

@tanliboy I'm working on with fine-tuning with video. It will soon be updated!

@tanliboy I've updated the code for video training! Do you need a inference demo with video via cli or gradio?

@2U1 thanks for the scripts for LORA tuning the model.

I was trying to finetune it on a small dataset ~2000 samples (single image single turn QA)

I was trying to do it on Kaggle with 29GB RAM and 2 * T4 GPUs with 15GB each...but I am always getting into CUDA OOM (no offload, on params offloaded) and RAM OOM if param and optimizer both offloaded to CPU. Is there any way out? What is the suggested compute?

Also, I am using 2B param model for now. Can you throw some light on this? Thanks!

@Anu0202 Thanks for your interest!
It takes a lot of memory so you should use offloading and decrease the max pixel values.

Thanks, @Anu0202 ! Will try it out .

Sign up or log in to comment