Unit 4. Build a music genre classifier
What you’ll learn and what you’ll build
Audio classification is one of the most common applications of transformers in audio and speech processing. Like other classification tasks in machine learning, this task involves assigning one or more labels to an audio recording based on its content. For example, in the case of speech, we might want to detect when wake words like “Hey Siri” are spoken, or infer a key word like “temperature” from a spoken query like “What is the weather today?“. Environmental sounds provide another example, where we might want to automatically distinguish between sounds such as “car horn”, “siren”, “dog barking”, etc.
In this section, we’ll look at how pre-trained audio transformers can be applied to a range of audio classification tasks. We’ll then fine-tune a transformer model on the task of music classification, classifying songs into genres like “pop” and “rock”. This is an important part of music streaming platforms like Spotify, which recommend songs that are similar to the ones the user is listening to.
By the end of this section, you’ll know how to:
- Find suitable pre-trained models for audio classification tasks
- Use the 🤗 Datasets library and the Hugging Face Hub to select audio classification datasets
- Fine-tune a pretrained model to classify songs by genre
- Build a Gradio demo that lets you classify your own songs