Google TPUs documentation

Differences between Jetstream Pytorch and PyTorch XLA

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Differences between Jetstream Pytorch and PyTorch XLA

This guide explains to optimum-tpu users the difference between Jetstream Pytorch and PyTorch XLA as those are two available backends in TGI.

JetStream PyTorch is a high-performance inference engine built on top of PyTorch XLA. It is optimized for throughput and memory efficiency when running Large Language Models (LLMs) on TPUs.

Feature Jetstream Pytorch PyTorch XLA
Training
Serving
Performance Higher serving performance Standard performance
Flexibility Limited to serving Full PyTorch ecosystem
Use Case Production inference Development and training
Integration Optimized for deployment Standard PyTorch workflow

Notes: By default, optimum-tpu is using PyTorch XLA for training and Jetstream Pytorch for serving.

You can configure optimum-tpu to use either version for serving with TGI. You can use the Pytorch XLA backend in TGI by setting up -e JETSTREAM_PT_DISABLE=1 in your docker run arguments.

You can find more information about: