File size: 4,607 Bytes
7cd79b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7732b79
7cd79b5
7732b79
7cd79b5
7732b79
7cd79b5
 
 
7732b79
70476ac
7cd79b5
 
 
70476ac
 
7cd79b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
base_model:
- tianweiy/DMD2
- ByteDance/Hyper-SD
- stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
library_name: diffusers
tags:
- text-to-image
- stable-diffusion
- sdxl
- adversarial diffusion distillation
---
# NitroFusion
<!-- > [**NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training**](), -->
> **NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training**
> 
> Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song

[[arXiv Paper]](https://arxiv.org/abs/2412.02030) [[Project Page]](https://chendaryen.github.io/NitroFusion.github.io/)

<!-- GitHub Repository: []() -->

![](./assets/banner.jpg)


## News
* 04 Dec 2024: [Paper](https://arxiv.org/abs/2412.02030) is released on arXiv, and the [project page](https://chendaryen.github.io/NitroFusion.github.io/) is now public.
* 30 Nov 2024: Our single-step text-to-image demo is publicly available on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I).
* 29 Nov 2024: Released two checkpoints: **NitroSD-Realism** and **NitroSD-Vibrant**.


## Online Demos 
NitroFusion single-step Text-to-Image demo hosted on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I)

## Model Overview
- `nitrosd-realism_unet.safetensors`: Produces photorealistic images with fine details. 
- `nitrosd-vibrant_unet.safetensors`: Offers vibrant, saturated color characteristics.
- Both models support 1 to 4 inference steps.  


## Usage

First, we  need to implement the scheduler with timestep shift for multi-step inference:
```python
from diffusers import LCMScheduler
class TimestepShiftLCMScheduler(LCMScheduler):
    def __init__(self, *args, shifted_timestep=250, **kwargs):
        super().__init__(*args, **kwargs)
        self.register_to_config(shifted_timestep=shifted_timestep)
    def set_timesteps(self, *args, **kwargs):
        super().set_timesteps(*args, **kwargs)
        self.origin_timesteps = self.timesteps.clone()
        self.shifted_timesteps = (self.timesteps * self.config.shifted_timestep / self.config.num_train_timesteps).long()
        self.timesteps = self.shifted_timesteps
    def step(self, model_output, timestep, sample, generator=None, return_dict=True):
        if self.step_index is None:
            self._init_step_index(timestep)
        self.timesteps = self.origin_timesteps
        output = super().step(model_output, timestep, sample, generator, return_dict)
        self.timesteps = self.shifted_timesteps
        return output
```


We can then utilize the diffuser pipeline:
```python
import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Load model.
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ChenDY/NitroFusion"
# NitroSD-Realism
ckpt = "nitrosd-realism_unet.safetensors"
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=250)
scheduler.config.original_inference_steps = 4
# # NitroSD-Vibrant
# ckpt = "nitrosd-vibrant_unet.safetensors"
# unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
# unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
# scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=500)
# scheduler.config.original_inference_steps = 4
pipe = DiffusionPipeline.from_pretrained(
    base_model_id,
    unet=unet,
    scheduler=scheduler,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
prompt = "a photo of a cat"
image = pipe(
    prompt=prompt,
    num_inference_steps=1,  # NotroSD-Realism and -Vibrant both support 1 - 4 inference steps.
    guidance_scale=0,
).images[0]
```

## License

NitroSD-Realism is released under [cc-by-nc-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en), following its base model *DMD2*.

NitroSD-Vibrant is released under [openrail++](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).

<!-- ## Contact 

Feel free to contact us if you have any questions about the paper!

Dar-Yen Chen [@surrey.ac.uk](mailto:@surrey.ac.uk)

## Citation 

If you find NitroFusion useful or relevant to your research, please kindly cite our papers:

```bib

``` -->