Using Perf Analyzer for profiling HuggingFace BART model

Overview

The example presents profiling of HuggingFace BART model using Perf Analyzer

Example consists of following scripts:

install.sh - install additional packages and libraries required to run the example
server.py - start the model with Triton Inference Server
client.py - execute HTTP/gRPC requests to the deployed model

The example requires the torch package. It can be installed in your current environment using pip:

pip install torch

Or you can use NVIDIA PyTorch container:

docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:23.10-py3 bash

If you select to use container we recommend to install NVIDIA Container Toolkit.

The step-by-step guide:

./install.sh

./server.py

./client.sh