Introduction to Argilla

In Chapter 5 you learnt how to build a dataset using the 🤗 Datasets library and in Chapter 6 you explored how to fine-tune models for some common NLP tasks. In this chapter, you will learn how to use Argilla to annotate and curate datasets that you can use to train and evaluate your models.

The key to training models that perform well is to have high-quality data. Although there are some good datasets in the Hub that you could use to train and evaluate your models, these may not be relevant for your specific application or use case. In this scenario, you may want to build and curate a dataset of your own. Argilla will help you to do this efficiently.

With Argilla you can:

turn unstructured data into structured data to be used in NLP tasks.
curate a dataset to go from a low-quality dataset to a high-quality dataset.
gather human feedback for LLMs and multi-modal models.
invite experts to collaborate with you in Argilla, or crowdsource annotations!

Here are some of the things that you will learn in this chapter:

How to set up your own Argilla instance.
How to load a dataset and configure it based on some popular NLP tasks.
How to use the Argilla UI to annotate your dataset.
How to use your curated dataset and export it to the Hub.

< > Update on GitHub

NLP Course

Introduction to Argilla