LMD: Faster Image Reconstruction with Latent Masking Diffusion
Abstract
As a class of fruitful approaches, <PRE_TAG>diffusion probabilistic models (DPMs)</POST_TAG> have shown excellent advantages in <PRE_TAG>high-resolution image reconstruction</POST_TAG>. On the other hand, <PRE_TAG>masked autoencoders (MAEs)</POST_TAG>, as popular self-supervised vision learners, have demonstrated simpler and more effective image reconstruction and transfer capabilities on <PRE_TAG>downstream tasks</POST_TAG>. However, they all require extremely high training costs, either due to inherent high temporal-dependence (i.e., excessively long diffusion steps) or due to artificially low spatial-dependence (i.e., human-formulated high mask ratio, such as 0.75). To the end, this paper presents LMD, a faster image reconstruction framework with latent masking diffusion. First, we propose to project and reconstruct images in <PRE_TAG>latent space</POST_TAG> through a <PRE_TAG>pre-trained <PRE_TAG><PRE_TAG>variational autoencoder</POST_TAG></POST_TAG></POST_TAG>, which is theoretically more efficient than in the <PRE_TAG>pixel-based space</POST_TAG>. Then, we combine the advantages of MAEs and DPMs to design a <PRE_TAG>progressive masking diffusion model</POST_TAG>, which gradually increases the <PRE_TAG>masking proportion</POST_TAG> by three different <PRE_TAG>schedulers</POST_TAG> and reconstructs the latent features from simple to difficult, without sequentially performing <PRE_TAG>denoising diffusion</POST_TAG> as in DPMs or using fixed high masking ratio as in MAEs, so as to alleviate the high training time-consumption predicament. Our approach allows for learning <PRE_TAG>high-capacity models</POST_TAG> and accelerate their training (by 3x or more) and barely reduces the original <PRE_TAG>accuracy</POST_TAG>. Inference speed in <PRE_TAG>downstream tasks</POST_TAG> also significantly outperforms the previous approaches.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper