SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 5 days ago • 115
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 9 days ago • 134
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 18 days ago • 117
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 201
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published 21 days ago • 56
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 27 days ago • 55
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Paper • 2501.18837 • Published 26 days ago • 9
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 42
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 50
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 99
Unsloth 4-bit Dynamic Quants Collection Unsloths Dynamic 4bit Quants selectively skips quantizing certain parameters; greatly improving accuracy while only using <10% more VRAM than BnB 4bit • 22 items • Updated 10 days ago • 50
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs Paper • 2410.05295 • Published Oct 3, 2024 • 12