Quite exceptional - looking forward to see the fork implemented in llama.cpp - but why that fork choice ?
The performance of this model is impressive. Can't wait to see the fork being properly added into llama.cpp
Did you have a reason for forking that way ?
It looks like the modifications to clip.cpp are not major, they could have been added directly without copying the file into examples.
Same with the minicpm-cli file, it's basically a slim llava-cli with the addition of the <user>
stopword, why not just use the llava-cli and add that stopword ?
In any case, very interesting work. My guess is that your choice of ViT was a major reason for that performance ?
Hi, thrilled to see the reply from the official author! We currently only use the fork repo, indeed mainly because the redundancy of the code is a bit high and the code is still in progress. We are working on fully implementing MiniCPM-Llama3-V 2.5's feature in llama.cpp, and will merge our code in the coming days to present our PR. Looking foward to merge into llama.cpp official repo soon.
Also for the performance improvement, we think it is multifaceted, a powerful vit is one aspect, still an advanced technical solution will be as well, such as a more suitable HD solution. The technique report is comming soon illustrating more techique details. Thanks for the attention!