Spaces:
Sleeping
HuggingFace Stable Diffusion 1.5 model
Overview
The example presents running HuggingFace Stable Diffusion 1.5 on PyTriton.
Example consists of following scripts:
install.sh
- install additional packages and libraries required to run the exampleserver.py
- start the model with Triton Inference Serverclient.py
- execute HTTP/gRPC requests to the deployed model
And configurations:
kubernetes
- example Helm Charts for serving and test inference in Kubernetes cluster
Running example locally
To run example locally the torch
package. It can be installed in your current environment using pip:
pip install torch
Or you can use NVIDIA PyTorch container:
docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:23.10-py3 bash
If you select to use container we recommend to install NVIDIA Container Toolkit.
The step-by-step guide:
- Install PyTriton following the installation instruction
- Install the additional packages using
install.sh
./install.sh
- In current terminal start the model on Triton using
server.py
./server.py
- Open new terminal tab (ex.
Ctrl + T
on Ubuntu) or window - Go to the example directory
- Run the
client.sh
to perform queries on model:
./client.sh
Running example on Kubernetes cluster
The following prerequisites must be matched to run the example:
- Kubernetes cluster with NVIDIA GPU node
- NVIDIA Device Plugin installed in Kubernetes cluster
- Docker Containers Registry accessible from Kubernetes cluster
- Installed Helm for creating the deployment and test job
Optionally you may install NVIDIA Container Toolkit and NVIDIA GPU Operator which enable more features like MIG or Time Slicing support in the cluster. To learn more how to set up Kubernetes cluster with NVIDIA GPU you can review NVIDIA Cloud Native Documentation
Below, we present a step-by-step guide assuming that all the commands are executed from the root of repository.
Follow these steps to run and test example in the cluster:
- [Optional] Build PyTriton wheel following the build instruction
- Prepare the tag under which image is going to be pushed to your Docker Containers Registry accessible from Kubernetes cluster. Example for local cluster (minikube, k3s) with registry hosted inside the cluster:
export DOCKER_IMAGE_NAME_WITH_TAG=localhost:5000/stable-diffusion-example:latest
- Build and push the Docker container image to your registry:
# Export the base image used for build
export FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:23.10-py3
./examples/huggingface_stable_diffusion/kubernetes/build_and_push.sh
Note: By default the container is built using pytriton
package from GitHub
. To build container with wheel built
locally use export BUILD_FROM=dist
before executing script.
- Install the Helm Chart with deployment and service:
helm upgrade -i --set deployment.image=${DOCKER_IMAGE_NAME_WITH_TAG} \
stable-diffusion-example \
./examples/huggingface_stable_diffusion/kubernetes/deployment
- Install the Helm Chart with client test
helm install --set image=${DOCKER_IMAGE_NAME_WITH_TAG} \
stable-diffusion-example-test \
./examples/huggingface_stable_diffusion/kubernetes/test
Now, you can review the logs from the running PODs to verify the inference is running. To show the logs from cluster for given POD first list all running pods:
kubectl get pods
Next show logs from server or client:
kubectl logs {NAME}
To remove the installed charts simply run:
helm uninstall stable-diffusion-example-test
helm uninstall stable-diffusion-example