Spaces:
Running
on
Zero
Running
on
Zero
Using Perf Analyzer for profiling HuggingFace BART model
Overview
The example presents profiling of HuggingFace BART model using Perf Analyzer
Example consists of following scripts:
install.sh
- install additional packages and libraries required to run the exampleserver.py
- start the model with Triton Inference Serverclient.py
- execute HTTP/gRPC requests to the deployed model
Requirements
The example requires the torch
package. It can be installed in your current environment using pip:
pip install torch
Or you can use NVIDIA PyTorch container:
docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:23.10-py3 bash
If you select to use container we recommend to install NVIDIA Container Toolkit.
Quick Start
The step-by-step guide:
- Install PyTriton following the installation instruction
- Install the additional packages using
install.sh
./install.sh
- In current terminal start the model on Triton using
server.py
./server.py
- Open new terminal tab (ex.
Ctrl + T
on Ubuntu) or window - Go to the example directory
- Run the
client.sh
to perform queries on model:
./client.sh