Spaces:
Sleeping
Sleeping
Using Perf Analyzer for profiling HuggingFace BART model
Overview
The example presents profiling of HuggingFace BART model using Perf Analyzer
Example consists of following scripts:
install.sh
- install additional packages and libraries required to run the exampleserver.py
- start the model with Triton Inference Serverclient.py
- execute HTTP/gRPC requests to the deployed model
Requirements
The example requires the torch
package. It can be installed in your current environment using pip:
pip install torch
Or you can use NVIDIA PyTorch container:
docker run -it --gpus 1 --shm-size 8gb -v {repository_path}:{repository_path} -w {repository_path} nvcr.io/nvidia/pytorch:23.10-py3 bash
If you select to use container we recommend to install NVIDIA Container Toolkit.
Quick Start
The step-by-step guide:
- Install PyTriton following the installation instruction
- Install the additional packages using
install.sh
./install.sh
- In current terminal start the model on Triton using
server.py
./server.py
- Open new terminal tab (ex.
Ctrl + T
on Ubuntu) or window - Go to the example directory
- Run the
client.sh
to perform queries on model:
./client.sh