Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering Paper • 2411.16863 • Published Nov 25, 2024
ELSA EU Project Collection Dataset and models created inside the ELSA – European Lighthouse on Secure and Safe AI project on Multimedia use case. • 4 items • Updated Nov 25, 2024
LLaVA-MORE Collection LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1 • 2 items • Updated Aug 31, 2024
aimagelab/LLaVA_MORE-llama_3_1-8B-S2-siglip-finetuning Image-Text-to-Text • Updated Aug 16, 2024 • 21 • 2
LLaVA-MORE Collection LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1 • 8 items • Updated Aug 16, 2024 • 1
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-finetuning Image-Text-to-Text • Updated Aug 16, 2024 • 19 • 1