QVQ Collection QVQ: Qwen models for visual reasoning β’ 4 items β’ Updated about 10 hours ago β’ 11
How to Synthesize Text Data without Model Collapse? Paper β’ 2412.14689 β’ Published 6 days ago β’ 45
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published 7 days ago β’ 103
The Open Source Advantage in Large Language Models (LLMs) Paper β’ 2412.12004 β’ Published 8 days ago β’ 9
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Paper β’ 2412.10117 β’ Published 12 days ago β’ 1
view article Article πͺπΊβοΈ EU AI Act: Systemic Risks in the First CoP Draft Comments βοΈπͺπΊ By yjernite β’ 12 days ago β’ 11
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper β’ 2404.02905 β’ Published Apr 3 β’ 65
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 12 days ago β’ 90
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper β’ 2412.07760 β’ Published 14 days ago β’ 49
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper β’ 2412.08443 β’ Published 14 days ago β’ 38
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Paper β’ 2412.06673 β’ Published 15 days ago β’ 11
Open Image Preferences Collection Containing all artifacts for the Stable Diffusion 3.5L vs Flux Dev image preference community sprint. β’ 14 items β’ Updated 6 days ago β’ 5
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper β’ 2412.04862 β’ Published 19 days ago β’ 48
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper β’ 2412.05271 β’ Published 18 days ago β’ 121
Negative Token Merging: Image-based Adversarial Feature Guidance Paper β’ 2412.01339 β’ Published 23 days ago β’ 21