Spaces:
Running
Running
File size: 4,275 Bytes
04d7292 83407f3 59a0a12 98ad326 40c310c 33bd222 40c310c 34e74b1 117475e 34e74b1 2cbde18 34e74b1 2cbde18 c09d436 fd1f9d7 2cbde18 65f47af f20db24 98ad326 117475e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
title: README
emoji: π
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
---
[diarizers-community](https://huggingface.co/diarizers-community) aims to promote speaker diarization on the Hugging Face hub. It contains:
- A collection of [multilingual speaker diarization datasets](https://huggingface.co/collections/diarizers-community/speaker-diarization-datasets-66261b8d571552066e003788) that are compatible with the [diarizers](https://github.com/huggingface/diarizers) library. They have been processed using [diarizers scripts](https://github.com/huggingface/diarizers/blob/main/datasets/README.md).
The available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim to add more datasets in the future to better support speaker diarization on the Hub.
- A collection of multilingual [fine-tuned segmentation model](https://huggingface.co/collections/diarizers-community/models-66261d0f9277b825c807ff2a) baselines compatible with [pyannote](https://github.com/pyannote/pyannote-audio).
Each model has been fine-tuned on a specific Callhome language subset. They achieve better performances on multilingual data compared to [pyannote](https://github.com/pyannote/pyannote-audio)'s pre-trained [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) model (see benchmark for more details on model performance).
Together with diarizers-community, we release:
- [diarizers](https://github.com/huggingface/diarizers/tree/main), a library for fine-tuning [pyannote](https://github.com/pyannote/pyannote-audio) speaker diarization models using the Hugging Face ecosystem.
- A google colab [notebook](https://colab.research.google.com/github/kamilakesbi/notebooks/blob/main/fine_tune_pyannote.ipynb), with a step-by-step guide on how to use diarizers.
**Benchmark**
| [Callhome](https://huggingface.co/datasets/diarizers-community/callhome) test dataset | Model | DER | False alarm | Missed detection| Confusion |
| ------------------------| ------------- | ------------- | ------------- | --------------- | ------------- |
| Japanese | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0) | 25.44 | **2.30** | 17.45 | 5.69 |
| | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-jpn) | **18.23** | 6.31 | **6.91** | **5.01** |
| Spanish | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0) | 33.44 | **2.59** | 25.19 | **5.66** |
| | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-spa) | **25.72** | 6.87 | **12.73** | 6.12 |
| English | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0) | 22.16 | **6.29** | 10.97 | 4.90 |
| | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-eng) | **18.40** | 7.10 | **6.98** | **4.32** |
| German | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0) | 21.90 | **3.10** | 14.25 | 4.55 |
| | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-deu) | **16.75** | 5.00 | **7.75** | **4.00** |
| Chinese | [Pretrained](https://huggingface.co/pyannote/segmentation-3.0) | 19.73 | **4.81** | 9.82 | 5.11 |
| | [Fine-tuned](https://huggingface.co/diarizers-community/speaker-segmentation-fine-tuned-callhome-zho) | **15.95** | 5.04 | **7.24** | **3.68** |
Results are in %. They have been obtained using the [test script](https://github.com/huggingface/diarizers/blob/main/test_segmentation.py) from diarizers.
|