MAGAer13/mPLUG-Owl · Apply for community grant: Academic project

Hi, we have built a multi-modal chatGPT-like model named mPLUG-Owl. It is a novel model that equips LLMs with multi-modal abilities through modularized learning of foundation LLM, a visual knowledge module, and a visual abstractor module. This approach can support multiple modalities and facilitate diverse unimodal and multimodal abilities through modality collaboration. Experimental results show that our model outperforms existing
multi-modal models (such as MiniGPT-4), demonstrating mPLUG-Owl’s impressive instruction and visual understanding ability, multi-turn conversation ability, and knowledge reasoning ability. Besides, we observe some unexpected and exciting abilities such as multi-image correlation and scene text understanding, which makes it possible to leverage it for harder real scenarios, such as vision-only document comprehension.

We also provide the github code for it: https://github.com/X-PLUG/mPLUG-Owl
As well as hosting a demo on modelscope: https://modelscope.cn/studios/damo/mPLUG-Owl/summary

Therefore, we are happy to host the demo on huggingface as well. However, due to the money shortage, we cannot afford the demo with GPU. Our demo requires a GPU with 24Gb or larger memory. We would provide the latest multi-lingustic model on Huggingface ONLY. Hope you can consider it. Thanks.