AutoTrain documentation

Sentence Transformers

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Sentence Transformers

This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.

AutoTrain supports the following types of sentence transformer finetuning:

  • pair: dataset with two sentences: anchor and positive
  • pair_class: dataset with two sentences: premise and hypothesis and a target label
  • pair_score: dataset with two sentences: sentence1 and sentence2 and a target score
  • triplet: dataset with three sentences: anchor, positive and negative
  • qa: dataset with two sentences: query and answer

Data Format

Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.

pair

For pair training, the data should be in the following format:

anchor positive
hello hi
how are you I am fine
What is your name? My name is Abhishek
Which is the best programming language? Python

pair_class

For pair_class training, the data should be in the following format:

premise hypothesis label
hello hi 1
how are you I am fine 0
What is your name? My name is Abhishek 1
Which is the best programming language? Python 1

pair_score

For pair_score training, the data should be in the following format:

sentence1 sentence2 score
hello hi 0.8
how are you I am fine 0.2
What is your name? My name is Abhishek 0.9
Which is the best programming language? Python 0.7

triplet

For triplet training, the data should be in the following format:

anchor positive negative
hello hi bye
how are you I am fine I am not fine
What is your name? My name is Abhishek Whats it to you?
Which is the best programming language? Python Javascript

qa

For qa training, the data should be in the following format:

query answer
hello hi
how are you I am fine
What is your name? My name is Abhishek
Which is the best programming language? Python
< > Update on GitHub