yerang's picture
Upload 1110 files
e3af00f verified
|
raw
history blame
2.11 kB

Using Perf Analyzer for profiling HuggingFace BART model

Overview

The example presents profiling of HuggingFace BART model using Perf Analyzer

Example consists of following scripts:

  • install.sh - install additional packages and libraries required to run the example
  • server.py - start the model with Triton Inference Server
  • client.py - execute HTTP/gRPC requests to the deployed model

Requirements

The example requires the torch package. It can be installed in your current environment using pip:

pip install torch

Or you can use NVIDIA PyTorch container:

docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:23.10-py3 bash

If you select to use container we recommend to install NVIDIA Container Toolkit.

Quick Start

The step-by-step guide:

  1. Install PyTriton following the installation instruction
  2. Install the additional packages using install.sh
./install.sh
  1. In current terminal start the model on Triton using server.py
./server.py
  1. Open new terminal tab (ex. Ctrl + T on Ubuntu) or window
  2. Go to the example directory
  3. Run the client.sh to perform queries on model:
./client.sh