Spaces:
Running
Running
Please add sauerkraut to multimodal language models
#1
by
stelterlab
- opened
Hi!
Did you ever consider to fine tune one of the VLMs like Llava or the Phi vision ones?
I tried microsoft/Phi-3.5-vision-instruct which is not bad at english, but could use a good portion of your Sauerkraut mix. And I think another worthy candidate could be Idefics 3 by Hugging Face which is also based on Llama 3.1 8B.
see also https://github.com/merveenoyan/smol-vision/blob/main/Idefics_FT.ipynb
Kind regards, @stelterlab