How to use Stable Diffusion on Habana Gaudi
🤗 Diffusers is compatible with Habana Gaudi through 🤗 Optimum Habana.
Requirements
- Optimum Habana 1.5 or later, here is how to install it.
- SynapseAI 1.9.
Inference Pipeline
To generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate two instances:
- A pipeline with
GaudiStableDiffusionPipeline
. This pipeline supports text-to-image generation. - A scheduler with
GaudiDDIMScheduler
. This scheduler has been optimized for Habana Gaudi.
When initializing the pipeline, you have to specify use_habana=True
to deploy it on HPUs.
Furthermore, in order to get the fastest possible generations you should enable HPU graphs with use_hpu_graphs=True
.
Finally, you will need to specify a Gaudi configuration which can be downloaded from the Hugging Face Hub.
from optimum.habana import GaudiConfig
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline
model_name = "stabilityai/stable-diffusion-2-base"
scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")
pipeline = GaudiStableDiffusionPipeline.from_pretrained(
model_name,
scheduler=scheduler,
use_habana=True,
use_hpu_graphs=True,
gaudi_config="Habana/stable-diffusion",
)
You can then call the pipeline to generate images by batches from one or several prompts:
outputs = pipeline(
prompt=[
"High quality photo of an astronaut riding a horse in space",
"Face of a yellow cat, high resolution, sitting on a park bench",
],
num_images_per_prompt=10,
batch_size=4,
)
For more information, check out Optimum Habana’s documentation and the example provided in the official Github repository.
Benchmark
Here are the latencies for Habana first-generation Gaudi and Gaudi2 with the Habana/stable-diffusion Gaudi configuration (mixed precision bf16/fp32):
- Stable Diffusion v1.5 (512x512 resolution):
Latency (batch size = 1) | Throughput (batch size = 8) | |
---|---|---|
first-generation Gaudi | 4.22s | 0.29 images/s |
Gaudi2 | 1.70s | 0.925 images/s |
- Stable Diffusion v2.1 (768x768 resolution):
Latency (batch size = 1) | Throughput | |
---|---|---|
first-generation Gaudi | 23.3s | 0.045 images/s (batch size = 2) |
Gaudi2 | 7.75s | 0.14 images/s (batch size = 5) |