VLM - a CCMat Collection

CCMat 's Collections

Adapters & Controls

Personalization

Vision

Video

Moe

Transformers & Attention

Gaming

StateSpaceModels

LLMs

TryOn

Audio

Agents

Data

Img Gen Foundational

UI

toread

VLM

VLM

updated Sep 11

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 66
Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 35
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8 • 39
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30 • 22
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9 • 29