NitroFusion / README.md

update README.md for arxiv paper and project page

7732b79 verified about 1 month ago

4.61 kB

	---
	base_model:
	- tianweiy/DMD2
	- ByteDance/Hyper-SD
	- stabilityai/stable-diffusion-xl-base-1.0
	pipeline_tag: text-to-image
	library_name: diffusers
	tags:
	- text-to-image
	- stable-diffusion
	- sdxl
	- adversarial diffusion distillation
	---
	# NitroFusion
	<!-- > [NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training](), -->
	> NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
	>
	> Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song

	[[arXiv Paper]](https://arxiv.org/abs/2412.02030) [[Project Page]](https://chendaryen.github.io/NitroFusion.github.io/)

	<!-- GitHub Repository: []() -->

	![](./assets/banner.jpg)


	## News
	* 04 Dec 2024: [Paper](https://arxiv.org/abs/2412.02030) is released on arXiv, and the [project page](https://chendaryen.github.io/NitroFusion.github.io/) is now public.
	* 30 Nov 2024: Our single-step text-to-image demo is publicly available on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I).
	* 29 Nov 2024: Released two checkpoints: NitroSD-Realism and NitroSD-Vibrant.


	## Online Demos
	NitroFusion single-step Text-to-Image demo hosted on [🤗 Hugging Face Space](https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I)

	## Model Overview
	- `nitrosd-realism_unet.safetensors`: Produces photorealistic images with fine details.
	- `nitrosd-vibrant_unet.safetensors`: Offers vibrant, saturated color characteristics.
	- Both models support 1 to 4 inference steps.


	## Usage

	First, we need to implement the scheduler with timestep shift for multi-step inference:
	```python
	from diffusers import LCMScheduler
	class TimestepShiftLCMScheduler(LCMScheduler):
	def __init__(self, args, shifted_timestep=250, *kwargs):
	super().__init__(args, *kwargs)
	self.register_to_config(shifted_timestep=shifted_timestep)
	def set_timesteps(self, args, *kwargs):
	super().set_timesteps(args, *kwargs)
	self.origin_timesteps = self.timesteps.clone()
	self.shifted_timesteps = (self.timesteps * self.config.shifted_timestep / self.config.num_train_timesteps).long()
	self.timesteps = self.shifted_timesteps
	def step(self, model_output, timestep, sample, generator=None, return_dict=True):
	if self.step_index is None:
	self._init_step_index(timestep)
	self.timesteps = self.origin_timesteps
	output = super().step(model_output, timestep, sample, generator, return_dict)
	self.timesteps = self.shifted_timesteps
	return output
	```


	We can then utilize the diffuser pipeline:
	```python
	import torch
	from diffusers import DiffusionPipeline, UNet2DConditionModel
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	# Load model.
	base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
	repo = "ChenDY/NitroFusion"
	# NitroSD-Realism
	ckpt = "nitrosd-realism_unet.safetensors"
	unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
	unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
	scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=250)
	scheduler.config.original_inference_steps = 4
	# # NitroSD-Vibrant
	# ckpt = "nitrosd-vibrant_unet.safetensors"
	# unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
	# unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
	# scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=500)
	# scheduler.config.original_inference_steps = 4
	pipe = DiffusionPipeline.from_pretrained(
	base_model_id,
	unet=unet,
	scheduler=scheduler,
	torch_dtype=torch.float16,
	variant="fp16",
	).to("cuda")
	prompt = "a photo of a cat"
	image = pipe(
	prompt=prompt,
	num_inference_steps=1, # NotroSD-Realism and -Vibrant both support 1 - 4 inference steps.
	guidance_scale=0,
	).images[0]
	```

	## License

	NitroSD-Realism is released under [cc-by-nc-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en), following its base model DMD2.

	NitroSD-Vibrant is released under [openrail++](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).

	<!-- ## Contact

	Feel free to contact us if you have any questions about the paper!

	Dar-Yen Chen [@surrey.ac.uk](mailto:@surrey.ac.uk)

	## Citation

	If you find NitroFusion useful or relevant to your research, please kindly cite our papers:

	```bib

	``` -->