SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 5 days ago • 115
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 7 days ago • 58
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 208
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 8 days ago • 89
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 14 days ago • 28
view article Article From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other • 14 days ago • 25
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated 6 days ago • 53
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • 27 days ago • 32
DeepSeek R1 (All Versions) Collection DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 18 days ago • 199
Qwen2.5 Collection The Qwen 2.5 models are a series of AI models trained on 18 trillion tokens, supporting 29 languages and offering advanced features such as instructio • 33 items • Updated Oct 12, 2024 • 7