[NeurIPS24] FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
https://arxiv.org/abs/2410.22655
[NEWS] [9.26] ππ Our FlowDCN is accepted by NeurIPS 2024! ππ
[NEWS] [11.22] πΊ Our FlowDCN models and code are now available in the official repo!
Pretrained Models
Our Models consistently achieve state-of-the-art results on the sFID metrics compared to SiT/DiT.
Metrics
Our Models consistently has fewer parameters and GFLOPS compared to Transformer counterparts. Our code also support LogNorm and VAR(Various Aspect Ratio Training)
Model-iters | Resolution | Solver | NFE-CFG | FID | sFID | Params |
---|---|---|---|---|---|---|
FlowDCN-S-400k | 256x256 | EulerSDE-250 | 250x2 | 54.6 | 8.8 | 30.3M |
FlowDCN-B-400k | 256x256 | EulerSDE-250 | 250x2 | 28.5 | 6.09 | 120M |
VAR-FlowDCN-B-400k | 256x256 | EulerSDE-250 | 250x2 | 23.6 | 7.72 | 120M |
FlowDCN-L-400k | 256x256 | EulerSDE-250 | 250x2 | 13.8 | 4.69 | 421M |
FlowDCN-XL-2M | 256x256 | EulerODE-250 | 250x2 | 2.01 | 4.33 | 618M |
FlowDCN-XL-2M | 256x256 | EulerSDE-250 | 250x2 | 2.00 | 4.37 | 618M |
FlowDCN-XL-2M | 256x256 | NeuralSolver-10 | 10x2 | 2.35 | 5.07 | 618M |
FlowDCN-XL-100k | 512x512 | EulerODE-50 | 50x2 | 2.76 | 5.29 | 618M |
FlowDCN-XL-100k | 512x512 | EulerSDE-250 | 250x2 | 2.44 | 4.53 | 618M |
FlowDCN-XL-100k | 512x512 | NeuralSolver-10 | 10x2 | 2.77 | 4.68 | 618M |
Visualizations
Various Resolution Extension
Models | 256x256 FID | sFID | IS | 320x320 FID | sFID | IS | 224x448 FID | sFID | IS | 160x480 FID | sFID | IS |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DiT-B | 44.83 | 8.49 | 32.05 | 95.47 | 108.68 | 18.38 | 109.1 | 110.71 | 14.00 | 143.8 | 122.81 | 8.93 |
with EI | 44.83 | 8.49 | 32.05 | 81.48 | 62.25 | 20.97 | 133.2 | 72.53 | 11.11 | 160.4 | 93.91 | 7.30 |
with PI | 44.83 | 8.49 | 32.05 | 72.47 | 54.02 | 24.15 | 133.4 | 70.29 | 11.73 | 156.5 | 93.80 | 7.80 |
FiT-B (+VAR) | 36.36 | 11.08 | 40.69 | 61.35 | 30.71 | 31.01 | 44.67 | 24.09 | 37.1 | 56.81 | 22.07 | 25.25 |
with VisionYaRN | 36.36 | 11.08 | 40.69 | 44.76 | 38.04 | 44.70 | 41.92 | 42.79 | 45.87 | 62.84 | 44.82 | 27.84 |
with VisionNTK | 36.36 | 11.08 | 40.69 | 57.31 | 31.31 | 33.97 | 43.84 | 26.25 | 39.22 | 56.76 | 24.18 | 26.40 |
FlowDCN-B | 28.5 | 6.09 | 51 | 34.4 | 27.2 | 52.2 | 71.7 | 62.0 | 23.7 | 211 | 111 | 5.83 |
FlowDCN-B (+VAR) | 23.6 | 7.72 | 62.8 | 29.1 | 15.8 | 69.5 | 31.4 | 17.0 | 62.4 | 44.7 | 17.8 | 35.8 |
Citation
@inproceedings{
wang2024exploring,
title={Exploring {DCN}-like architecture for fast image generation with arbitrary resolution},
author={Shuai Wang and Zexian Li and Tianhui Song and Xubin Li and Tiezheng Ge and Bo Zheng and Limin Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=e57B7BfA2B}
}
- Downloads last month
- 0