File size: 4,607 Bytes
7cd79b5 7732b79 7cd79b5 7732b79 7cd79b5 7732b79 7cd79b5 7732b79 70476ac 7cd79b5 70476ac 7cd79b5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
---
base_model:
- tianweiy/DMD2
- ByteDance/Hyper-SD
- stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
library_name: diffusers
tags:
- text-to-image
- stable-diffusion
- sdxl
- adversarial diffusion distillation
---
# NitroFusion
<!-- > [**NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training**](), -->
> **NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training**
>
> Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song
[[arXiv Paper]](https://arxiv.org/abs/2412.02030) [[Project Page]](https://chendaryen.github.io/NitroFusion.github.io/)
<!-- GitHub Repository: []() -->
![](./assets/banner.jpg)
## News
* 04 Dec 2024: [Paper](https://arxiv.org/abs/2412.02030) is released on arXiv, and the [project page](https://chendaryen.github.io/NitroFusion.github.io/) is now public.
* 30 Nov 2024: Our single-step text-to-image demo is publicly available on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I).
* 29 Nov 2024: Released two checkpoints: **NitroSD-Realism** and **NitroSD-Vibrant**.
## Online Demos
NitroFusion single-step Text-to-Image demo hosted on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I)
## Model Overview
- `nitrosd-realism_unet.safetensors`: Produces photorealistic images with fine details.
- `nitrosd-vibrant_unet.safetensors`: Offers vibrant, saturated color characteristics.
- Both models support 1 to 4 inference steps.
## Usage
First, we need to implement the scheduler with timestep shift for multi-step inference:
```python
from diffusers import LCMScheduler
class TimestepShiftLCMScheduler(LCMScheduler):
def __init__(self, *args, shifted_timestep=250, **kwargs):
super().__init__(*args, **kwargs)
self.register_to_config(shifted_timestep=shifted_timestep)
def set_timesteps(self, *args, **kwargs):
super().set_timesteps(*args, **kwargs)
self.origin_timesteps = self.timesteps.clone()
self.shifted_timesteps = (self.timesteps * self.config.shifted_timestep / self.config.num_train_timesteps).long()
self.timesteps = self.shifted_timesteps
def step(self, model_output, timestep, sample, generator=None, return_dict=True):
if self.step_index is None:
self._init_step_index(timestep)
self.timesteps = self.origin_timesteps
output = super().step(model_output, timestep, sample, generator, return_dict)
self.timesteps = self.shifted_timesteps
return output
```
We can then utilize the diffuser pipeline:
```python
import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Load model.
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ChenDY/NitroFusion"
# NitroSD-Realism
ckpt = "nitrosd-realism_unet.safetensors"
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=250)
scheduler.config.original_inference_steps = 4
# # NitroSD-Vibrant
# ckpt = "nitrosd-vibrant_unet.safetensors"
# unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
# unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
# scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=500)
# scheduler.config.original_inference_steps = 4
pipe = DiffusionPipeline.from_pretrained(
base_model_id,
unet=unet,
scheduler=scheduler,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
prompt = "a photo of a cat"
image = pipe(
prompt=prompt,
num_inference_steps=1, # NotroSD-Realism and -Vibrant both support 1 - 4 inference steps.
guidance_scale=0,
).images[0]
```
## License
NitroSD-Realism is released under [cc-by-nc-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en), following its base model *DMD2*.
NitroSD-Vibrant is released under [openrail++](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).
<!-- ## Contact
Feel free to contact us if you have any questions about the paper!
Dar-Yen Chen [@surrey.ac.uk](mailto:@surrey.ac.uk)
## Citation
If you find NitroFusion useful or relevant to your research, please kindly cite our papers:
```bib
``` -->
|