metadata
base_model:
- tianweiy/DMD2
- ByteDance/Hyper-SD
- stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
library_name: diffusers
tags:
- text-to-image
- stable-diffusion
- sdxl
- adversarial diffusion distillation
NitroFusion
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song
News
- 04 Dec 2024: Paper is released on arXiv, and the project page is now public.
- 30 Nov 2024: Our single-step text-to-image demo is publicly available on 🤗 Hugging Face Space.
- 29 Nov 2024: Released two checkpoints: NitroSD-Realism and NitroSD-Vibrant.
Online Demos
NitroFusion single-step Text-to-Image demo hosted on 🤗 Hugging Face Space
Model Overview
nitrosd-realism_unet.safetensors
: Produces photorealistic images with fine details.nitrosd-vibrant_unet.safetensors
: Offers vibrant, saturated color characteristics.- Both models support 1 to 4 inference steps.
Usage
First, we need to implement the scheduler with timestep shift for multi-step inference:
from diffusers import LCMScheduler
class TimestepShiftLCMScheduler(LCMScheduler):
def __init__(self, *args, shifted_timestep=250, **kwargs):
super().__init__(*args, **kwargs)
self.register_to_config(shifted_timestep=shifted_timestep)
def set_timesteps(self, *args, **kwargs):
super().set_timesteps(*args, **kwargs)
self.origin_timesteps = self.timesteps.clone()
self.shifted_timesteps = (self.timesteps * self.config.shifted_timestep / self.config.num_train_timesteps).long()
self.timesteps = self.shifted_timesteps
def step(self, model_output, timestep, sample, generator=None, return_dict=True):
if self.step_index is None:
self._init_step_index(timestep)
self.timesteps = self.origin_timesteps
output = super().step(model_output, timestep, sample, generator, return_dict)
self.timesteps = self.shifted_timesteps
return output
We can then utilize the diffuser pipeline:
import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Load model.
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ChenDY/NitroFusion"
# NitroSD-Realism
ckpt = "nitrosd-realism_unet.safetensors"
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=250)
scheduler.config.original_inference_steps = 4
# # NitroSD-Vibrant
# ckpt = "nitrosd-vibrant_unet.safetensors"
# unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
# unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
# scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=500)
# scheduler.config.original_inference_steps = 4
pipe = DiffusionPipeline.from_pretrained(
base_model_id,
unet=unet,
scheduler=scheduler,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
prompt = "a photo of a cat"
image = pipe(
prompt=prompt,
num_inference_steps=1, # NotroSD-Realism and -Vibrant both support 1 - 4 inference steps.
guidance_scale=0,
).images[0]
License
NitroSD-Realism is released under cc-by-nc-4.0, following its base model DMD2.
NitroSD-Vibrant is released under openrail++.