-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 30 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 11 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 20 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11
Collections
Discover the best community collections!
Collections including paper arxiv:2404.02747
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 27 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 33 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11 -
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 7
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 64 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 20 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 20
-
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Paper • 2403.12943 • Published • 14 -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 41 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 20
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 6 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
-
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models
Paper • 2312.09608 • Published • 13 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 69 -
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
Paper • 2310.17994 • Published • 8 -
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss
Paper • 2401.02677 • Published • 21
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 24 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 113 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 73 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33
-
Generative Powers of Ten
Paper • 2312.02149 • Published • 4 -
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Paper • 2312.01409 • Published • 8 -
SANeRF-HQ: Segment Anything for NeRF in High Quality
Paper • 2312.01531 • Published • 5 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 14