{ "cells": [ { "cell_type": "markdown", "id": "75b58048-7d14-4fc6-8085-1fc08c81b4a6", "metadata": { "id": "75b58048-7d14-4fc6-8085-1fc08c81b4a6" }, "source": [ "# Fine-Tune Whisper For Multilingual ASR with ๐ค Transformers" ] }, { "cell_type": "markdown", "id": "fbfa8ad5-4cdc-4512-9058-836cbbf65e1a", "metadata": { "id": "fbfa8ad5-4cdc-4512-9058-836cbbf65e1a" }, "source": [ "In this Colab, we present a step-by-step guide on how to fine-tune Whisper \n", "for any multilingual ASR dataset using Hugging Face ๐ค Transformers. This is a \n", "more \"hands-on\" version of the accompanying [blog post](https://huggingface.co/blog/fine-tune-whisper). \n", "For a more in-depth explanation of Whisper, the Common Voice dataset and the theory behind fine-tuning, the reader is advised to refer to the blog post." ] }, { "cell_type": "markdown", "id": "afe0d503-ae4e-4aa7-9af4-dbcba52db41e", "metadata": { "id": "afe0d503-ae4e-4aa7-9af4-dbcba52db41e" }, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "9ae91ed4-9c3e-4ade-938e-f4c2dcfbfdc0", "metadata": { "id": "9ae91ed4-9c3e-4ade-938e-f4c2dcfbfdc0" }, "source": [ "Whisper is a pre-trained model for automatic speech recognition (ASR) \n", "published in [September 2022](https://openai.com/blog/whisper/) by the authors \n", "Alec Radford et al. from OpenAI. Unlike many of its predecessors, such as \n", "[Wav2Vec 2.0](https://arxiv.org/abs/2006.11477), which are pre-trained \n", "on un-labelled audio data, Whisper is pre-trained on a vast quantity of \n", "**labelled** audio-transcription data, 680,000 hours to be precise. \n", "This is an order of magnitude more data than the un-labelled audio data used \n", "to train Wav2Vec 2.0 (60,000 hours). What is more, 117,000 hours of this \n", "pre-training data is multilingual ASR data. This results in checkpoints \n", "that can be applied to over 96 languages, many of which are considered \n", "_low-resource_.\n", "\n", "When scaled to 680,000 hours of labelled pre-training data, Whisper models \n", "demonstrate a strong ability to generalise to many datasets and domains.\n", "The pre-trained checkpoints achieve competitive results to state-of-the-art \n", "ASR systems, with near 3% word error rate (WER) on the test-clean subset of \n", "LibriSpeech ASR and a new state-of-the-art on TED-LIUM with 4.7% WER (_c.f._ \n", "Table 8 of the [Whisper paper](https://cdn.openai.com/papers/whisper.pdf)).\n", "The extensive multilingual ASR knowledge acquired by Whisper during pre-training \n", "can be leveraged for other low-resource languages; through fine-tuning, the \n", "pre-trained checkpoints can be adapted for specific datasets and languages \n", "to further improve upon these results. We'll show just how Whisper can be fine-tuned \n", "for low-resource languages in this Colab." ] }, { "cell_type": "markdown", "id": "e59b91d6-be24-4b5e-bb38-4977ea143a72", "metadata": { "id": "e59b91d6-be24-4b5e-bb38-4977ea143a72" }, "source": [ "" ] }, { "cell_type": "markdown", "id": "21b6316e-8a55-4549-a154-66d3da2ab74a", "metadata": { "id": "21b6316e-8a55-4549-a154-66d3da2ab74a" }, "source": [ "The Whisper checkpoints come in five configurations of varying model sizes.\n", "The smallest four are trained on either English-only or multilingual data.\n", "The largest checkpoint is multilingual only. All nine of the pre-trained checkpoints \n", "are available on the [Hugging Face Hub](https://huggingface.co/models?search=openai/whisper). The \n", "checkpoints are summarised in the following table with links to the models on the Hub:\n", "\n", "| Size | Layers | Width | Heads | Parameters | English-only | Multilingual |\n", "|--------|--------|-------|-------|------------|------------------------------------------------------|---------------------------------------------------|\n", "| tiny | 4 | 384 | 6 | 39 M | [โ](https://huggingface.co/openai/whisper-tiny.en) | [โ](https://huggingface.co/openai/whisper-tiny.) |\n", "| base | 6 | 512 | 8 | 74 M | [โ](https://huggingface.co/openai/whisper-base.en) | [โ](https://huggingface.co/openai/whisper-base) |\n", "| small | 12 | 768 | 12 | 244 M | [โ](https://huggingface.co/openai/whisper-small.en) | [โ](https://huggingface.co/openai/whisper-small) |\n", "| medium | 24 | 1024 | 16 | 769 M | [โ](https://huggingface.co/openai/whisper-medium.en) | [โ](https://huggingface.co/openai/whisper-medium) |\n", "| large | 32 | 1280 | 20 | 1550 M | x | [โ](https://huggingface.co/openai/whisper-large) |\n", "\n", "For demonstration purposes, we'll fine-tune the multilingual version of the \n", "[`\"small\"`](https://huggingface.co/openai/whisper-small) checkpoint with 244M params (~= 1GB). \n", "As for our data, we'll train and evaluate our system on a low-resource language \n", "taken from the [Common Voice](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0)\n", "dataset. We'll show that with as little as 8 hours of fine-tuning data, we can achieve \n", "strong performance in this language." ] }, { "cell_type": "markdown", "id": "3a680dfc-cbba-4f6c-8a1f-e1a5ff3f123a", "metadata": { "id": "3a680dfc-cbba-4f6c-8a1f-e1a5ff3f123a" }, "source": [ "------------------------------------------------------------------------\n", "\n", "\\\\({}^1\\\\) The name Whisper follows from the acronym โWSPSRโ, which stands for โWeb-scale Supervised Pre-training for Speech Recognitionโ." ] }, { "cell_type": "markdown", "id": "55fb8d21-df06-472a-99dd-b59567be6dad", "metadata": { "id": "55fb8d21-df06-472a-99dd-b59567be6dad" }, "source": [ "## Prepare Environment" ] }, { "cell_type": "markdown", "id": "844a4861-929c-4762-b29b-80b1e95aba4b", "metadata": { "id": "844a4861-929c-4762-b29b-80b1e95aba4b" }, "source": [ "First of all, let's try to secure a decent GPU for our Colab! Unfortunately, it's becoming much harder to get access to a good GPU with the free version of Google Colab. However, with Google Colab Pro one should have no issues in being allocated a V100 or P100 GPU.\n", "\n", "To get a GPU, click _Runtime_ -> _Change runtime type_, then change _Hardware accelerator_ from _None_ to _GPU_." ] }, { "cell_type": "markdown", "id": "9abea5d7-9d54-434b-a6bd-399d1b3c6c1a", "metadata": { "id": "9abea5d7-9d54-434b-a6bd-399d1b3c6c1a" }, "source": [ "We can verify that we've been assigned a GPU and view its specifications:" ] }, { "cell_type": "code", "execution_count": null, "id": "95048026-a3b7-43f0-a274-1bad65e407b4", "metadata": { "id": "95048026-a3b7-43f0-a274-1bad65e407b4", "outputId": "50fb4e06-a255-4303-e14a-a9c171e4c5d1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sun Dec 18 21:59:00 2022 \n", "+-----------------------------------------------------------------------------+\n", "| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |\n", "|-------------------------------+----------------------+----------------------+\n", "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", "| | | MIG M. |\n", "|===============================+======================+======================|\n", "| 0 NVIDIA A100-SXM... On | 00000000:06:00.0 Off | 0 |\n", "| N/A 32C P0 47W / 400W | 0MiB / 40960MiB | 0% Default |\n", "| | | Disabled |\n", "+-------------------------------+----------------------+----------------------+\n", " \n", "+-----------------------------------------------------------------------------+\n", "| Processes: |\n", "| GPU GI CI PID Type Process name GPU Memory |\n", "| ID ID Usage |\n", "|=============================================================================|\n", "| No running processes found |\n", "+-----------------------------------------------------------------------------+\n" ] } ], "source": [ "gpu_info = !nvidia-smi\n", "gpu_info = '\\n'.join(gpu_info)\n", "if gpu_info.find('failed') >= 0:\n", " print('Not connected to a GPU')\n", "else:\n", " print(gpu_info)" ] }, { "cell_type": "markdown", "id": "9cd52dc1-ade1-44bb-a2d7-2ed98f110fed", "metadata": { "id": "9cd52dc1-ade1-44bb-a2d7-2ed98f110fed" }, "source": [ "Next, we need to update the Unix package `ffmpeg` to version 4:" ] }, { "cell_type": "code", "execution_count": null, "id": "69ee227d-60c5-44bf-b04d-c2092f997454", "metadata": { "id": "69ee227d-60c5-44bf-b04d-c2092f997454", "outputId": "dfa39498-c89b-4385-dd89-93ca025fd7e8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;31mE: \u001b[0mCould not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)\u001b[0m\n", "\u001b[1;31mE: \u001b[0mUnable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?\u001b[0m\n", "Reading package lists... Done\n", "\u001b[1;31mE: \u001b[0mCould not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)\u001b[0m\n", "\u001b[1;31mE: \u001b[0mUnable to lock directory /var/lib/apt/lists/\u001b[0m\n", "\u001b[1;33mW: \u001b[0mProblem unlinking the file /var/cache/apt/pkgcache.bin - RemoveCaches (13: Permission denied)\u001b[0m\n", "\u001b[1;33mW: \u001b[0mProblem unlinking the file /var/cache/apt/srcpkgcache.bin - RemoveCaches (13: Permission denied)\u001b[0m\n", "Error: must run as root\n" ] } ], "source": [ "!apt install -y ffmpeg\n", "!apt update\n", "!add-apt-repository -y ppa:jonathonf/ffmpeg-4\n" ] }, { "cell_type": "markdown", "id": "1d85d613-1c7e-46ac-9134-660bbe7ebc9d", "metadata": { "id": "1d85d613-1c7e-46ac-9134-660bbe7ebc9d" }, "source": [ "We'll employ several popular Python packages to fine-tune the Whisper model.\n", "We'll use `datasets` to download and prepare our training data and \n", "`transformers` to load and train our Whisper model. We'll also require\n", "the `soundfile` package to pre-process audio files, `evaluate` and `jiwer` to\n", "assess the performance of our model. Finally, we'll\n", "use `gradio` to build a flashy demo of our fine-tuned model." ] }, { "cell_type": "code", "execution_count": null, "id": "e68ea9f8-9b61-414e-8885-3033b67c2850", "metadata": { "id": "e68ea9f8-9b61-414e-8885-3033b67c2850", "outputId": "f69f6af2-21ac-480a-8d1b-2d55e4e0cf54" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Defaulting to user installation because normal site-packages is not writeable\n", "Collecting git+https://github.com/huggingface/transformers\n", " Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-u8_6wo82\n", " Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-u8_6wo82\n", " Resolved https://github.com/huggingface/transformers to commit 7032e0203262ebb2ebf55da8d2e01f873973e835\n", " Installing build dependencies ... \u001b[?25ldone\n", "\u001b[?25h Getting requirements to build wheel ... \u001b[?25ldone\n", "\u001b[?25h Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n", "\u001b[?25hRequirement already satisfied: huggingface-hub<1.0,>=0.10.0 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (0.11.1)\n", "Requirement already satisfied: numpy>=1.17 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (1.23.4)\n", "Requirement already satisfied: tqdm>=4.27 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (4.64.1)\n", "Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from transformers==4.26.0.dev0) (3.0.12)\n", "Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from transformers==4.26.0.dev0) (5.3.1)\n", "Requirement already satisfied: requests in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (2.28.1)\n", "Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (0.13.2)\n", "Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (21.3)\n", "Requirement already satisfied: regex!=2019.12.17 in ./.local/lib/python3.8/site-packages (from transformers==4.26.0.dev0) (2022.10.31)\n", "Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.local/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.10.0->transformers==4.26.0.dev0) (4.4.0)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/lib/python3/dist-packages (from packaging>=20.0->transformers==4.26.0.dev0) (2.4.6)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.26.0.dev0) (1.26.13)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->transformers==4.26.0.dev0) (2019.11.28)\n", "Requirement already satisfied: charset-normalizer<3,>=2 in ./.local/lib/python3.8/site-packages (from requests->transformers==4.26.0.dev0) (2.1.1)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->transformers==4.26.0.dev0) (2.8)\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: librosa in ./.local/lib/python3.8/site-packages (0.9.2)\n", "Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.8/site-packages (from librosa) (21.3)\n", "Requirement already satisfied: audioread>=2.1.9 in ./.local/lib/python3.8/site-packages (from librosa) (3.0.0)\n", "Requirement already satisfied: joblib>=0.14 in ./.local/lib/python3.8/site-packages (from librosa) (1.2.0)\n", "Requirement already satisfied: numpy>=1.17.0 in ./.local/lib/python3.8/site-packages (from librosa) (1.23.4)\n", "Requirement already satisfied: numba>=0.45.1 in ./.local/lib/python3.8/site-packages (from librosa) (0.56.4)\n", "Requirement already satisfied: soundfile>=0.10.2 in ./.local/lib/python3.8/site-packages (from librosa) (0.11.0)\n", "Requirement already satisfied: decorator>=4.0.10 in /usr/lib/python3/dist-packages (from librosa) (4.4.2)\n", "Requirement already satisfied: pooch>=1.0 in ./.local/lib/python3.8/site-packages (from librosa) (1.6.0)\n", "Requirement already satisfied: scikit-learn>=0.19.1 in /usr/lib/python3/dist-packages (from librosa) (0.22.2.post1)\n", "Requirement already satisfied: resampy>=0.2.2 in ./.local/lib/python3.8/site-packages (from librosa) (0.4.2)\n", "Requirement already satisfied: scipy>=1.2.0 in ./.local/lib/python3.8/site-packages (from librosa) (1.9.3)\n", "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in ./.local/lib/python3.8/site-packages (from numba>=0.45.1->librosa) (0.39.1)\n", "Requirement already satisfied: importlib-metadata in ./.local/lib/python3.8/site-packages (from numba>=0.45.1->librosa) (5.0.0)\n", "Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from numba>=0.45.1->librosa) (45.2.0)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/lib/python3/dist-packages (from packaging>=20.0->librosa) (2.4.6)\n", "Requirement already satisfied: requests>=2.19.0 in ./.local/lib/python3.8/site-packages (from pooch>=1.0->librosa) (2.28.1)\n", "Requirement already satisfied: appdirs>=1.3.0 in /usr/lib/python3/dist-packages (from pooch>=1.0->librosa) (1.4.3)\n", "Requirement already satisfied: cffi>=1.0 in /usr/lib/python3/dist-packages (from soundfile>=0.10.2->librosa) (1.14.0)\n", "Requirement already satisfied: charset-normalizer<3,>=2 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->pooch>=1.0->librosa) (2.1.1)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa) (2.8)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests>=2.19.0->pooch>=1.0->librosa) (2019.11.28)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests>=2.19.0->pooch>=1.0->librosa) (1.26.13)\n", "Requirement already satisfied: zipp>=0.5 in /usr/lib/python3/dist-packages (from importlib-metadata->numba>=0.45.1->librosa) (1.0.0)\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: jiwer in ./.local/lib/python3.8/site-packages (2.5.1)\n", "Requirement already satisfied: levenshtein==0.20.2 in ./.local/lib/python3.8/site-packages (from jiwer) (0.20.2)\n", "Requirement already satisfied: rapidfuzz<3.0.0,>=2.3.0 in ./.local/lib/python3.8/site-packages (from levenshtein==0.20.2->jiwer) (2.13.6)\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: gradio in ./.local/lib/python3.8/site-packages (3.14.0)\n", "Requirement already satisfied: pandas in ./.local/lib/python3.8/site-packages (from gradio) (1.5.1)\n", "Requirement already satisfied: pydantic in ./.local/lib/python3.8/site-packages (from gradio) (1.10.2)\n", "Requirement already satisfied: jinja2 in ./.local/lib/python3.8/site-packages (from gradio) (3.1.2)\n", "Requirement already satisfied: matplotlib in ./.local/lib/python3.8/site-packages (from gradio) (3.5.3)\n", "Requirement already satisfied: pillow in /usr/lib/python3/dist-packages (from gradio) (7.0.0)\n", "Requirement already satisfied: fastapi in ./.local/lib/python3.8/site-packages (from gradio) (0.88.0)\n", "Requirement already satisfied: fsspec in ./.local/lib/python3.8/site-packages (from gradio) (2022.11.0)\n", "Requirement already satisfied: markupsafe in ./.local/lib/python3.8/site-packages (from gradio) (2.1.1)\n", "Requirement already satisfied: markdown-it-py[linkify,plugins] in ./.local/lib/python3.8/site-packages (from gradio) (2.1.0)\n", "Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from gradio) (5.3.1)\n", "Requirement already satisfied: pydub in ./.local/lib/python3.8/site-packages (from gradio) (0.25.1)\n", "Requirement already satisfied: aiohttp in ./.local/lib/python3.8/site-packages (from gradio) (3.8.3)\n", "Requirement already satisfied: httpx in ./.local/lib/python3.8/site-packages (from gradio) (0.23.1)\n", "Requirement already satisfied: requests in ./.local/lib/python3.8/site-packages (from gradio) (2.28.1)\n", "Requirement already satisfied: uvicorn in ./.local/lib/python3.8/site-packages (from gradio) (0.20.0)\n", "Requirement already satisfied: websockets>=10.0 in ./.local/lib/python3.8/site-packages (from gradio) (10.4)\n", "Requirement already satisfied: pycryptodome in ./.local/lib/python3.8/site-packages (from gradio) (3.16.0)\n", "Requirement already satisfied: python-multipart in ./.local/lib/python3.8/site-packages (from gradio) (0.0.5)\n", "Requirement already satisfied: numpy in ./.local/lib/python3.8/site-packages (from gradio) (1.23.4)\n", "Requirement already satisfied: altair in ./.local/lib/python3.8/site-packages (from gradio) (4.2.0)\n", "Requirement already satisfied: ffmpy in ./.local/lib/python3.8/site-packages (from gradio) (0.3.0)\n", "Requirement already satisfied: orjson in ./.local/lib/python3.8/site-packages (from gradio) (3.8.3)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in ./.local/lib/python3.8/site-packages (from aiohttp->gradio) (1.8.2)\n", "Requirement already satisfied: aiosignal>=1.1.2 in ./.local/lib/python3.8/site-packages (from aiohttp->gradio) (1.3.1)\n", "Requirement already satisfied: charset-normalizer<3.0,>=2.0 in ./.local/lib/python3.8/site-packages (from aiohttp->gradio) (2.1.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in ./.local/lib/python3.8/site-packages (from aiohttp->gradio) (6.0.3)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/lib/python3/dist-packages (from aiohttp->gradio) (19.3.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in ./.local/lib/python3.8/site-packages (from aiohttp->gradio) (1.3.3)\n", "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in ./.local/lib/python3.8/site-packages (from aiohttp->gradio) (4.0.2)\n", "Requirement already satisfied: jsonschema>=3.0 in /usr/lib/python3/dist-packages (from altair->gradio) (3.2.0)\n", "Requirement already satisfied: entrypoints in /usr/lib/python3/dist-packages (from altair->gradio) (0.3)\n", "Requirement already satisfied: toolz in /usr/lib/python3/dist-packages (from altair->gradio) (0.9.0)\n", "Requirement already satisfied: python-dateutil>=2.8.1 in ./.local/lib/python3.8/site-packages (from pandas->gradio) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in ./.local/lib/python3.8/site-packages (from pandas->gradio) (2022.5)\n", "Requirement already satisfied: starlette==0.22.0 in ./.local/lib/python3.8/site-packages (from fastapi->gradio) (0.22.0)\n", "Requirement already satisfied: anyio<5,>=3.4.0 in ./.local/lib/python3.8/site-packages (from starlette==0.22.0->fastapi->gradio) (3.6.2)\n", "Requirement already satisfied: typing-extensions>=3.10.0 in ./.local/lib/python3.8/site-packages (from starlette==0.22.0->fastapi->gradio) (4.4.0)\n", "Requirement already satisfied: sniffio in ./.local/lib/python3.8/site-packages (from httpx->gradio) (1.3.0)\n", "Requirement already satisfied: certifi in /usr/lib/python3/dist-packages (from httpx->gradio) (2019.11.28)\n", "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in ./.local/lib/python3.8/site-packages (from httpx->gradio) (1.5.0)\n", "Requirement already satisfied: httpcore<0.17.0,>=0.15.0 in ./.local/lib/python3.8/site-packages (from httpx->gradio) (0.16.2)\n", "Requirement already satisfied: mdurl~=0.1 in ./.local/lib/python3.8/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.1.2)\n", "Requirement already satisfied: mdit-py-plugins in ./.local/lib/python3.8/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.3.3)\n", "Requirement already satisfied: linkify-it-py~=1.0 in ./.local/lib/python3.8/site-packages (from markdown-it-py[linkify,plugins]->gradio) (1.0.3)\n", "Requirement already satisfied: cycler>=0.10 in /usr/lib/python3/dist-packages (from matplotlib->gradio) (0.10.0)\n", "Requirement already satisfied: pyparsing>=2.2.1 in /usr/lib/python3/dist-packages (from matplotlib->gradio) (2.4.6)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/lib/python3/dist-packages (from matplotlib->gradio) (1.0.1)\n", "Requirement already satisfied: fonttools>=4.22.0 in ./.local/lib/python3.8/site-packages (from matplotlib->gradio) (4.38.0)\n", "Requirement already satisfied: packaging>=20.0 in ./.local/lib/python3.8/site-packages (from matplotlib->gradio) (21.3)\n", "Requirement already satisfied: six>=1.4.0 in /usr/lib/python3/dist-packages (from python-multipart->gradio) (1.14.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->gradio) (2.8)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.local/lib/python3.8/site-packages (from requests->gradio) (1.26.13)\n", "Requirement already satisfied: click>=7.0 in /usr/lib/python3/dist-packages (from uvicorn->gradio) (7.0)\n", "Requirement already satisfied: h11>=0.8 in ./.local/lib/python3.8/site-packages (from uvicorn->gradio) (0.14.0)\n", "Requirement already satisfied: uc-micro-py in ./.local/lib/python3.8/site-packages (from linkify-it-py~=1.0->markdown-it-py[linkify,plugins]->gradio) (1.0.1)\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n" ] } ], "source": [ "!pip install datasets>=2.6.1\n", "!pip install git+https://github.com/huggingface/transformers\n", "!pip install librosa\n", "!pip install evaluate>=0.30\n", "!pip install jiwer\n", "!pip install gradio" ] }, { "cell_type": "markdown", "id": "1f60d173-8de1-4ed7-bc9a-d281cf237203", "metadata": { "id": "1f60d173-8de1-4ed7-bc9a-d281cf237203" }, "source": [ "We strongly advise you to upload model checkpoints directly the [Hugging Face Hub](https://huggingface.co/) \n", "whilst training. The Hub provides:\n", "- Integrated version control: you can be sure that no model checkpoint is lost during training.\n", "- Tensorboard logs: track important metrics over the course of training.\n", "- Model cards: document what a model does and its intended use cases.\n", "- Community: an easy way to share and collaborate with the community!\n", "\n", "Linking the notebook to the Hub is straightforward - it simply requires entering your \n", "Hub authentication token when prompted. Find your Hub authentication token [here](https://huggingface.co/settings/tokens):" ] }, { "cell_type": "code", "execution_count": null, "id": "b045a39e-2a3e-4153-bdb5-281500bcd348", "metadata": { "id": "b045a39e-2a3e-4153-bdb5-281500bcd348", "colab": { "referenced_widgets": [ "fba6cbb348874ff2a0fecdd6f44a27ac" ] }, "outputId": "7c6a0b99-5b20-48be-84b0-bef65b7b5007" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "fba6cbb348874ff2a0fecdd6f44a27ac", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='
Step | \n", "Training Loss | \n", "Validation Loss | \n", "Wer | \n", "
---|---|---|---|
4000 | \n", "0.147600 | \n", "0.322550 | \n", "44.976586 | \n", "
"
],
"text/plain": [
"