Spaces:
Sleeping
Sleeping
<!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# Stable diffusion pipelines | |
Stable Diffusion is a text-to-image _latent diffusion_ model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) dataset. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and can run on consumer GPUs. | |
Latent diffusion is the research on top of which Stable Diffusion was built. It was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, BjΓΆrn Ommer. You can learn more details about it in the [specific pipeline for latent diffusion](pipelines/latent_diffusion) that is part of π€ Diffusers. | |
For more details about how Stable Diffusion works and how it differs from the base latent diffusion model, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-announcement) and [this section of our own blog post](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work). | |
*Tips*: | |
- To tweak your prompts on a specific result you liked, you can generate your own latents, as demonstrated in the following notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) | |
*Overview*: | |
| Pipeline | Tasks | Colab | Demo | |
|---|---|:---:|:---:| | |
| [StableDiffusionPipeline](./text2img) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) | [π€ Stable Diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion) | |
| [StableDiffusionImg2ImgPipeline](./img2img) | *Image-to-Image Text-Guided Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) | [π€ Diffuse the Rest](https://huggingface.co/spaces/huggingface/diffuse-the-rest) | |
| [StableDiffusionInpaintPipeline](./inpaint) | **Experimental** β *Text-Guided Image Inpainting* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) | Coming soon | |
| [StableDiffusionDepth2ImgPipeline](./depth2img) | **Experimental** β *Depth-to-Image Text-Guided Generation * | | Coming soon | |
| [StableDiffusionImageVariationPipeline](./image_variation) | **Experimental** β *Image Variation Generation * | | [π€ Stable Diffusion Image Variations](https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations) | |
| [StableDiffusionUpscalePipeline](./upscale) | **Experimental** β *Text-Guided Image Super-Resolution * | | Coming soon | |
| [StableDiffusionLatentUpscalePipeline](./latent_upscale) | **Experimental** β *Text-Guided Image Super-Resolution * | | Coming soon | |
| [StableDiffusionInstructPix2PixPipeline](./pix2pix) | **Experimental** β *Text-Based Image Editing * | | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://huggingface.co/spaces/timbrooks/instruct-pix2pix) | |
| [StableDiffusionAttendAndExcitePipeline](./attend_and_excite) | **Experimental** β *Text-to-Image Generation * | | [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite) | |
| [StableDiffusionPix2PixZeroPipeline](./pix2pix_zero) | **Experimental** β *Text-Based Image Editing * | | [Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027) | |
| [StableDiffusionModelEditingPipeline](./model_editing) | **Experimental** β *Text-to-Image Model Editing * | | [Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://arxiv.org/abs/2303.08084) | |
## Tips | |
### How to load and use different schedulers. | |
The stable diffusion pipeline uses [`PNDMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the stable diffusion pipeline such as [`DDIMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc. | |
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following: | |
```python | |
>>> from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler | |
>>> pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") | |
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) | |
>>> # or | |
>>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler") | |
>>> pipeline = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=euler_scheduler) | |
``` | |
### How to convert all use cases with multiple or single pipeline | |
If you want to use all possible use cases in a single `DiffusionPipeline` you can either: | |
- Make use of the [Stable Diffusion Mega Pipeline](https://github.com/huggingface/diffusers/tree/main/examples/community#stable-diffusion-mega) or | |
- Make use of the `components` functionality to instantiate all components in the most memory-efficient way: | |
```python | |
>>> from diffusers import ( | |
... StableDiffusionPipeline, | |
... StableDiffusionImg2ImgPipeline, | |
... StableDiffusionInpaintPipeline, | |
... ) | |
>>> text2img = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") | |
>>> img2img = StableDiffusionImg2ImgPipeline(**text2img.components) | |
>>> inpaint = StableDiffusionInpaintPipeline(**text2img.components) | |
>>> # now you can use text2img(...), img2img(...), inpaint(...) just like the call methods of each respective pipeline | |
``` | |
## StableDiffusionPipelineOutput | |
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput | |