# Deploying in Cluster The library can be used inside containers and deployed on Kubernetes clusters. There are certain prerequisites and information that would help deploy the library in your cluster. ## Health checks The library uses the Triton Inference Server to handle HTTP/gRPC requests. Triton Server provides endpoints to validate if the server is ready and in a healthy state. The following API endpoints can be used in your orchestrator to control the application ready and live states: - Ready: `/v2/health/ready` - Live: `/v2/health/live` ## Exposing ports The library uses the Triton Inference Server, which exposes the HTTP, gRPC, and metrics ports for communication. In the default configuration, the following ports have to be exposed: - 8000 for HTTP - 8001 for gRPC - 8002 for metrics If the library is inside a Docker container, the ports can be exposed by passing an extra argument to the `docker run` command. An example of passing ports configuration: ```shell docker run -p 8000:8000 -p 8001:8001 -p 8002:8002 {image} ``` To deploy a container in Kubernetes, add a ports definition for the container in YAML deployment configuration: ```yaml containers: - name: pytriton ... ports: - containerPort: 8000 name: http - containerPort: 8001 name: grpc - containerPort: 8002 name: metrics ``` ## Configuring shared memory The connection between Python callbacks and the Triton Inference Server uses shared memory to pass data between the processes. In the Docker container, the default amount of shared memory is `64MB`, which may not be enough to pass input and output data of the model. The PyTriton initialize `16MB` of shared memory for `Proxy Backend` at start to pass input/output tensors between processes. The additional memory is allocated dynamically. In case of failure, the size of available shared memory might need to be increased. To increase the available shared memory size, pass an additional flag to the `docker run` command. An example of increasing the shared memory size to 8GB: ```shell docker run --shm-size 8GB {image} ``` To increase the shared memory size for Kubernetes, the following configuration can be used: ```yaml spec: volumes: - name: shared-memory emptyDir: medium: Memory containers: - name: pytriton ... volumeMounts: - mountPath: /dev/shm name: shared-memory ``` ## Specify container init process You can use the [`--init` flag](https://docs.docker.com/engine/reference/run/#specify-an-init-process) of the `docker run` command to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures that reaping zombie processes are performed inside the container. The reaping zombie processes functionality is important in case of an unexpected error occurrence in scripts hosting PyTriton.