license: apache-2.0
tags:
- yahoo-open-source-software-incubator
Salient Object Aware Background Generation
This repository accompanies our paper, Salient Object-Aware Background Generation using Text-Guided Diffusion Models, which has been accepted for publication in CVPR 2024 Generative Models for Computer Vision workshop.
The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models. We show that models such as Stable Inpainting can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. We provide some examples of object expansion as follows:
Setup
The dependencies are provided in requirements.txt
, install them by:
pip install -r requirements.txt
Usage
Training
The following runs the training of text-to-image inpainting ControlNet initialized with the weights of "stable-diffusion-2-inpainting":
accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet_inpaint.py --pretrained_model_name_or_path "stable-diffusion-2-inpainting" --proportion_empty_prompts 0.1
The following runs the training of text-to-image ControlNet initialized with the weights of "stable-diffusion-2-base":
accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet.py --pretrained_model_name_or_path "stable-diffusion-2-base" --proportion_empty_prompts 0.1
Inference
Please refer to inference.ipynb
. Tu run the code you need to download our model checkpoints.
Models Checkpoints
Model link | Datasets used |
---|---|
controlnet_inpainting_salient_aware.pth | Salient segmentation datasets, COCO |
Citations
If you found our work useful, please consider citing our paper:
@misc{eshratifar2024salient,
title={Salient Object-Aware Background Generation using Text-Guided Diffusion Models},
author={Amir Erfan Eshratifar and Joao V. B. Soares and Kapil Thadani and Shaunak Mishra and Mikhail Kuznetsov and Yueh-Ning Ku and Paloma de Juan},
year={2024},
eprint={2404.10157},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Maintainers
- Erfan Eshratifar: [email protected]
- Joao Soares: [email protected]
License
This project is licensed under the terms of the Apache 2.0 open source license. Please refer to LICENSE for the full terms.