multimodal - a Hyeonmin Collection

Hyeonmin 's Collections

SSM

Linear

function calling

MoE

LLM

Emb

Vision

Others

Code

multimodal

updated 11 days ago

internlm/internlm-xcomposer2-vl-1_8b

Visual Question Answering • Updated Apr 9, 2024 • 631 • 17
openbmb/MiniCPM-V-2

Visual Question Answering • Updated 9 days ago • 4.81k • 441
llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • Updated 16 days ago • 521k • 248
Qwen/Qwen-VL-Chat

Text Generation • Updated Jan 25, 2024 • 21.3k • 348
Qwen/Qwen-VL

Text Generation • Updated Jan 25, 2024 • 31.2k • 222
openbmb/MiniCPM-Llama3-V-2_5

Image-Text-to-Text • Updated 9 days ago • 27.6k • 1.39k
microsoft/Phi-3-vision-128k-instruct

Text Generation • Updated Aug 20, 2024 • 109k • 945
OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • Updated Dec 18, 2024 • 30.3k • 165
Qwen/Qwen2-VL-72B-Instruct

Image-Text-to-Text • Updated 12 days ago • 163k • • 264
mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated 29 days ago • 586
llava-hf/llava-1.5-7b-hf

Image-Text-to-Text • Updated 11 days ago • 709k • 223
meta-llama/Llama-3.2-11B-Vision-Instruct

Image-Text-to-Text • Updated Dec 4, 2024 • 2.56M • • 1.25k