File size: 4,843 Bytes
e3af00f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
<!--
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# HuggingFace Stable Diffusion 1.5 model

## Overview

The example presents running HuggingFace Stable Diffusion 1.5 on PyTriton.

Example consists of following scripts:

- `install.sh` - install additional packages and libraries required to run the example
- `server.py` - start the model with Triton Inference Server
- `client.py` - execute HTTP/gRPC requests to the deployed model

And configurations:

- `kubernetes` - example Helm Charts for serving and test inference in Kubernetes cluster

## Running example locally

To run example locally the `torch` package. It can be installed in your current environment using pip:

```shell
pip install torch
```

Or you can use NVIDIA PyTorch container:

```shell
docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:23.10-py3 bash
```

If you select to use container we recommend to install
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/overview.html).

The step-by-step guide:

1. Install PyTriton following
   the [installation instruction](../../README.md#installation)
2. Install the additional packages using `install.sh`

```shell
./install.sh
```

3. In current terminal start the model on Triton using `server.py`

```shell
./server.py
```

4. Open new terminal tab (ex. `Ctrl + T` on Ubuntu) or window
5. Go to the example directory
6. Run the `client.sh` to perform queries on model:

```shell
./client.sh
```

## Running example on Kubernetes cluster

The following prerequisites must be matched to run the example:

- Kubernetes cluster with NVIDIA GPU node
- [NVIDIA Device Plugin](https://github.com/NVIDIA/k8s-device-plugin) installed in Kubernetes cluster
- Docker Containers Registry accessible from Kubernetes cluster
- [Installed Helm](https://helm.sh/docs/intro/install/) for creating the deployment and test job

Optionally you may install NVIDIA Container Toolkit and NVIDIA GPU Operator which enable more features
like [MIG](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-operator-mig.html) or
[Time Slicing](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html) support in the cluster.
To learn more how to set up Kubernetes cluster with NVIDIA GPU you can review [
NVIDIA Cloud Native Documentation](https://docs.nvidia.com/datacenter/cloud-native/contents.html)

Below, we present a step-by-step guide assuming that **all the commands are executed from the root of repository**.

Follow these steps to run and test example in the cluster:
1. [Optional] Build PyTriton wheel following the [build instruction](../../docs/building.md)
2. Prepare the tag under which image is going to be pushed to your Docker Containers Registry accessible from Kubernetes
cluster. Example for local cluster (minikube, k3s) with registry hosted inside the cluster:
```shell
export DOCKER_IMAGE_NAME_WITH_TAG=localhost:5000/stable-diffusion-example:latest
```
3. Build and push the Docker container image to your registry:

```shell
# Export the base image used for build
export FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:23.10-py3
./examples/huggingface_stable_diffusion/kubernetes/build_and_push.sh
```

**Note**: By default the container is built using `pytriton` package from `GitHub`. To build container with wheel built
locally use `export BUILD_FROM=dist` before executing script.

4. Install the Helm Chart with deployment and service:

```shell
helm upgrade -i --set deployment.image=${DOCKER_IMAGE_NAME_WITH_TAG} \
stable-diffusion-example \
./examples/huggingface_stable_diffusion/kubernetes/deployment
```

5. Install the Helm Chart with client test

```shell
helm install --set image=${DOCKER_IMAGE_NAME_WITH_TAG} \
stable-diffusion-example-test \
./examples/huggingface_stable_diffusion/kubernetes/test
```

Now, you can review the logs from the running PODs to verify the inference is running. To show the logs from cluster
for given POD first list all running pods:
```shell
kubectl get pods
```

Next show logs from server or client:
```shell
kubectl logs {NAME}
```

To remove the installed charts simply run:
```shell
helm uninstall stable-diffusion-example-test
helm uninstall stable-diffusion-example
```