Inference Endpoints (dedicated)

Inference Endpoints Version

Hugging Face Inference Endpoints comes with a default serving container which is used for all supported Transformers and Sentence-Transformers tasks and for custom inference handler and implement batching. Below you will find information about the installed packages and versions used.

You can always upgrade installed packages and a custom packages by adding a requirements.txt file to your model repository. Read more in Add custom Dependencies.

Installed packages & version

The Hugging Face Inference Runtime has separate versions for PyTorch and TensorFlow for CPU and GPU, which are used based on the selected framework when creating an Inference Endpoint. The TensorFlow and PyTorch flavors are grouped together in the list below.

General

Python: 3.11
huggingface_hub: 0.20.3
pytorch: 2.2.0
transformers[sklearn,sentencepiece,audio,vision]: 4.48.0
diffusers: 0.26.3
accelerate: 0.27.2
sentence_transformers: 2.4.0
pandas: latest
peft: 0.9.0
tensorflow: latest

GPU

CUDA: 12.3

< > Update on GitHub