Please add sauerkraut to multimodal language models

#1
by stelterlab - opened

Hi!

Did you ever consider to fine tune one of the VLMs like Llava or the Phi vision ones?

I tried microsoft/Phi-3.5-vision-instruct which is not bad at english, but could use a good portion of your Sauerkraut mix. And I think another worthy candidate could be Idefics 3 by Hugging Face which is also based on Llama 3.1 8B.

see also https://github.com/merveenoyan/smol-vision/blob/main/Idefics_FT.ipynb

Kind regards, @stelterlab

Sign up or log in to comment