[NeurIPS24] FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

https://arxiv.org/abs/2410.22655

caps

[NEWS] [9.26] πŸ’πŸ’ Our FlowDCN is accepted by NeurIPS 2024! πŸ’πŸ’

[NEWS] [11.22] 🍺 Our FlowDCN models and code are now available in the official repo!

Pretrained Models

Our Models consistently achieve state-of-the-art results on the sFID metrics compared to SiT/DiT.

Metrics

Our Models consistently has fewer parameters and GFLOPS compared to Transformer counterparts. Our code also support LogNorm and VAR(Various Aspect Ratio Training)

Model-iters Resolution Solver NFE-CFG FID sFID Params
FlowDCN-S-400k 256x256 EulerSDE-250 250x2 54.6 8.8 30.3M
FlowDCN-B-400k 256x256 EulerSDE-250 250x2 28.5 6.09 120M
VAR-FlowDCN-B-400k 256x256 EulerSDE-250 250x2 23.6 7.72 120M
FlowDCN-L-400k 256x256 EulerSDE-250 250x2 13.8 4.69 421M
FlowDCN-XL-2M 256x256 EulerODE-250 250x2 2.01 4.33 618M
FlowDCN-XL-2M 256x256 EulerSDE-250 250x2 2.00 4.37 618M
FlowDCN-XL-2M 256x256 NeuralSolver-10 10x2 2.35 5.07 618M
FlowDCN-XL-100k 512x512 EulerODE-50 50x2 2.76 5.29 618M
FlowDCN-XL-100k 512x512 EulerSDE-250 250x2 2.44 4.53 618M
FlowDCN-XL-100k 512x512 NeuralSolver-10 10x2 2.77 4.68 618M

Visualizations

caps

Various Resolution Extension

Models 256x256 FID sFID IS 320x320 FID sFID IS 224x448 FID sFID IS 160x480 FID sFID IS
DiT-B 44.83 8.49 32.05 95.47 108.68 18.38 109.1 110.71 14.00 143.8 122.81 8.93
with EI 44.83 8.49 32.05 81.48 62.25 20.97 133.2 72.53 11.11 160.4 93.91 7.30
with PI 44.83 8.49 32.05 72.47 54.02 24.15 133.4 70.29 11.73 156.5 93.80 7.80
FiT-B (+VAR) 36.36 11.08 40.69 61.35 30.71 31.01 44.67 24.09 37.1 56.81 22.07 25.25
with VisionYaRN 36.36 11.08 40.69 44.76 38.04 44.70 41.92 42.79 45.87 62.84 44.82 27.84
with VisionNTK 36.36 11.08 40.69 57.31 31.31 33.97 43.84 26.25 39.22 56.76 24.18 26.40
FlowDCN-B 28.5 6.09 51 34.4 27.2 52.2 71.7 62.0 23.7 211 111 5.83
FlowDCN-B (+VAR) 23.6 7.72 62.8 29.1 15.8 69.5 31.4 17.0 62.4 44.7 17.8 35.8

Citation

@inproceedings{
wang2024exploring,
title={Exploring {DCN}-like architecture for fast image generation with arbitrary resolution},
author={Shuai Wang and Zexian Li and Tianhui Song and Xubin Li and Tiezheng Ge and Bo Zheng and Limin Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=e57B7BfA2B}
}
Downloads last month
0
Inference API
Unable to determine this model's library. Check the docs .