Hub documentation

File formats

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

File formats

Polars supports the following file formats when reading from Hugging Face:

The examples below show the default settings only. Use the links above to view all available parameters in the API reference guide.

Parquet

Parquet is the preferred file format as it stores the schema with type information within the file. This avoids any ambiguity with parsing and speeds up reading. To read a Parquet file in Polars, use the read_parquet function:

pl.read_parquet("hf://datasets/roneneldan/TinyStories/data/train-00000-of-00004-2d5a1467fff1081b.parquet")

CSV

The read_csv function can be used to read a CSV file:

pl.read_csv("hf://datasets/lhoestq/demo1/data/train.csv")

JSON

Polars supports reading new line delimited JSON — also known as json lines — with the read_ndjson function:

pl.read_ndjson("hf://datasets/proj-persona/PersonaHub/persona.jsonl")
< > Update on GitHub