MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Paper • 2408.02900 • Published Aug 6, 2024 • 28
The Geometry of Tokens in Internal Representations of Large Language Models Paper • 2501.10573 • Published 10 days ago • 8
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 18 days ago • 85
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding Paper • 2501.07783 • Published 14 days ago • 7
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 10 days ago • 37
Generalized Gaussian Model for Learned Image Compression Paper • 2411.19320 • Published Nov 28, 2024 • 1
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token Paper • 2412.06676 • Published Dec 9, 2024 • 9
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper • 2411.17459 • Published Nov 26, 2024 • 10
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations? Paper • 2406.10743 • Published Jun 15, 2024 • 1
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings Paper • 2411.08017 • Published Nov 12, 2024 • 11
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11, 2024 • 28
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation Paper • 2411.04967 • Published Nov 7, 2024 • 1
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper • 2410.13863 • Published Oct 17, 2024 • 37
You Don't Need Data-Augmentation in Self-Supervised Learning Paper • 2406.09294 • Published Jun 13, 2024 • 1