physical-intelligence
/

fast

Inference Endpoints

Model card Files Files and versions Community

fast / README.md

nielsr's picture

nielsr HF staff

Update README.md

d810d01 verified 5 days ago

|

3.38 kB

	---
	library_name: transformers
	license: apache-2.0
	tags:
	- tokenizer
	pipeline_tag: robotics
	---

	# FAST: Efficient Action Tokenization for Vision-Language-Action Models

	This is the official repo for the [FAST action tokenizer](https://www.pi.website/research/fast) from the paper [FAST: Efficient Action Tokenization for Vision-Language-Action Models](https://huggingface.co/papers/2501.09747).

	The action tokenizer maps any sequence of robot actions into a sequence of dense, discrete action tokens for training autoregressive VLA models.

	Here, we provide:
	1. FAST+, our universal action tokenizer, trained on 1M real robot action sequences.
	2. Code for quickly training new action tokenizers on your custom dataset.

	## Installation

	FAST can be used as a convenient HuggingFace AutoProcessor. To use it, simply install the `transformers` package (and `scipy` for the underlying DCT algorithm).

	```
	pip install transformers scipy
	```

	## Using the Universal Action Tokenizer

	We recommend applying the tokenizer to 1-second action "chunks" that have been pre-normalized to a range of [-1...1]
	(we use quantile normalization for this step -- check our paper). Encoding and decoding support batched inference.

	```
	import numpy as np
	from transformers import AutoProcessor

	# Load the tokenizer from the Hugging Face hub
	tokenizer = AutoProcessor.from_pretrained("physical-intelligence/fast", trust_remote_code=True)

	# Tokenize & decode action chunks (we use dummy data here)
	action_data = np.random.rand(256, 50, 14) # one batch of action chunks
	tokens = tokenizer(action_data) # tokens = list[int]
	decoded_actions = tokenizer.decode(tokens)
	```

	Note: During decoding, the tokenizer needs to map the decoded sequence of actions back into a `[time_horizon, action_dim]` matrix.
	There are multiple ways to provide the necessary dimensions to the tokenizer: (1) they automatically get saved on the first `forward()` call, (2) you can set them manually as arguments to the `decode()` call


	## Training a new Action Tokenizer on Your Own Data

	In our experiments, we found the FAST+ universal tokenizer to work well across a wide range of robot setups, action dimensions, and control frequencies.
	If you, however, want to train a custom FAST tokenizer for your dataset at hand, it is very easy using the `.fit()` convenience function we provide.
	When called on a dataset of action chunks (of the same or different lengths), it returns a new tokenizer instance, which you can save and optionally push
	to the HuggingFace hub. Training should typically only take a few seconds to minutes.

	```
	# First, we download the tokenizer from the Hugging Face model hub
	# Here, we will not use the pre-trained tokenizer weights, but only the source code
	# to train a new tokenizer on our own data.
	tokenizer = AutoProcessor.from_pretrained("physical-intelligence/fast", trust_remote_code=True)

	# Load your action data for tokenizer training
	# Chunks do not need to be of the same length, we will use dummy data
	action_data = np.random.rand(4000, 50, 14)

	# Train the new tokenizer, depending on your dataset size this can take a few minutes
	tokenizer = tokenizer.fit(action_data)

	# Save the new tokenizer, optionally push it to the Hugging Face model hub
	tokenizer.save_pretrained("<your_local_path>")
	tokenizer.push_to_hub("YourUsername/my_new_tokenizer")
	```