|
--- |
|
base_model: |
|
- tianweiy/DMD2 |
|
- ByteDance/Hyper-SD |
|
- stabilityai/stable-diffusion-xl-base-1.0 |
|
pipeline_tag: text-to-image |
|
library_name: diffusers |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
- sdxl |
|
- adversarial diffusion distillation |
|
--- |
|
# NitroFusion |
|
<!-- > [**NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training**](), --> |
|
> **NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training** |
|
> |
|
> Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song |
|
|
|
[[arXiv Paper]](https://arxiv.org/abs/2412.02030) [[Project Page]](https://chendaryen.github.io/NitroFusion.github.io/) |
|
|
|
<!-- GitHub Repository: []() --> |
|
|
|
![](./assets/banner.jpg) |
|
|
|
|
|
## News |
|
* 04 Dec 2024: [Paper](https://arxiv.org/abs/2412.02030) is released on arXiv, and the [project page](https://chendaryen.github.io/NitroFusion.github.io/) is now public. |
|
* 30 Nov 2024: Our single-step text-to-image demo is publicly available on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I). |
|
* 29 Nov 2024: Released two checkpoints: **NitroSD-Realism** and **NitroSD-Vibrant**. |
|
|
|
|
|
## Online Demos |
|
NitroFusion single-step Text-to-Image demo hosted on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I) |
|
|
|
## Model Overview |
|
- `nitrosd-realism_unet.safetensors`: Produces photorealistic images with fine details. |
|
- `nitrosd-vibrant_unet.safetensors`: Offers vibrant, saturated color characteristics. |
|
- Both models support 1 to 4 inference steps. |
|
|
|
|
|
## Usage |
|
|
|
First, we need to implement the scheduler with timestep shift for multi-step inference: |
|
```python |
|
from diffusers import LCMScheduler |
|
class TimestepShiftLCMScheduler(LCMScheduler): |
|
def __init__(self, *args, shifted_timestep=250, **kwargs): |
|
super().__init__(*args, **kwargs) |
|
self.register_to_config(shifted_timestep=shifted_timestep) |
|
def set_timesteps(self, *args, **kwargs): |
|
super().set_timesteps(*args, **kwargs) |
|
self.origin_timesteps = self.timesteps.clone() |
|
self.shifted_timesteps = (self.timesteps * self.config.shifted_timestep / self.config.num_train_timesteps).long() |
|
self.timesteps = self.shifted_timesteps |
|
def step(self, model_output, timestep, sample, generator=None, return_dict=True): |
|
if self.step_index is None: |
|
self._init_step_index(timestep) |
|
self.timesteps = self.origin_timesteps |
|
output = super().step(model_output, timestep, sample, generator, return_dict) |
|
self.timesteps = self.shifted_timesteps |
|
return output |
|
``` |
|
|
|
|
|
We can then utilize the diffuser pipeline: |
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline, UNet2DConditionModel |
|
from huggingface_hub import hf_hub_download |
|
from safetensors.torch import load_file |
|
# Load model. |
|
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" |
|
repo = "ChenDY/NitroFusion" |
|
# NitroSD-Realism |
|
ckpt = "nitrosd-realism_unet.safetensors" |
|
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16) |
|
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda")) |
|
scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=250) |
|
scheduler.config.original_inference_steps = 4 |
|
# # NitroSD-Vibrant |
|
# ckpt = "nitrosd-vibrant_unet.safetensors" |
|
# unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16) |
|
# unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda")) |
|
# scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=500) |
|
# scheduler.config.original_inference_steps = 4 |
|
pipe = DiffusionPipeline.from_pretrained( |
|
base_model_id, |
|
unet=unet, |
|
scheduler=scheduler, |
|
torch_dtype=torch.float16, |
|
variant="fp16", |
|
).to("cuda") |
|
prompt = "a photo of a cat" |
|
image = pipe( |
|
prompt=prompt, |
|
num_inference_steps=1, # NotroSD-Realism and -Vibrant both support 1 - 4 inference steps. |
|
guidance_scale=0, |
|
).images[0] |
|
``` |
|
|
|
## License |
|
|
|
NitroSD-Realism is released under [cc-by-nc-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en), following its base model *DMD2*. |
|
|
|
NitroSD-Vibrant is released under [openrail++](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md). |
|
|
|
<!-- ## Contact |
|
|
|
Feel free to contact us if you have any questions about the paper! |
|
|
|
Dar-Yen Chen [@surrey.ac.uk](mailto:@surrey.ac.uk) |
|
|
|
## Citation |
|
|
|
If you find NitroFusion useful or relevant to your research, please kindly cite our papers: |
|
|
|
```bib |
|
|
|
``` --> |
|
|