Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation Paper β’ 2501.04144 β’ Published 5 days ago β’ 13
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 16 items β’ Updated Dec 6, 2024 β’ 189
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day β’ 266 items β’ Updated about 16 hours ago β’ 34
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper β’ 2412.08580 β’ Published Dec 11, 2024 β’ 45
Learning Flow Fields in Attention for Controllable Person Image Generation Paper β’ 2412.08486 β’ Published Dec 11, 2024 β’ 32
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper β’ 2410.20280 β’ Published Oct 26, 2024 β’ 23
view article Article Breaking resolution curse of vision-language models By visheratin β’ Feb 24, 2024 β’ 11
Playground v2 Collection Collection of Playground v2 models β’ 4 items β’ Updated Dec 6, 2023 β’ 7
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models Paper β’ 2407.11213 β’ Published Jul 15, 2024 β’ 3
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person Paper β’ 2407.16224 β’ Published Jul 23, 2024 β’ 27
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper β’ 2406.04325 β’ Published Jun 6, 2024 β’ 73
Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning Paper β’ 2403.06728 β’ Published Mar 11, 2024 β’ 2
Boximator: Generating Rich and Controllable Motions for Video Synthesis Paper β’ 2402.01566 β’ Published Feb 2, 2024 β’ 26
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation Paper β’ 2311.16492 β’ Published Nov 27, 2023 β’ 2
Text Promptable Surgical Instrument Segmentation with Vision-Language Models Paper β’ 2306.09244 β’ Published Jun 15, 2023 β’ 2
HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation Paper β’ 2303.15994 β’ Published Mar 28, 2023 β’ 2