Jenthe
/

ECAPA2

Model card Files Files and versions

Jenthe commited on Oct 16, 2023

Commit

ab62acf

1 Parent(s): ebbe4a8

Update README.md

Browse files

Files changed (1) hide show

README.md +16 -7

README.md CHANGED Viewed

@@ -4,7 +4,16 @@ license: cc-by-nc-4.0
 # ECAPA2 Speaker Embedding Extractor
-ECAPA2 is a hybrid neural network architecture and training strategy for speaker recognition. The provided model is pre-trained and has an easy-to-use API to extract speaker embeddings.
 <!---
 ## Model Details
@@ -83,12 +92,12 @@ feature = ecapa2_model(audio, label='embedding|gfe_1|pool')
 The following table describes the available features:
-| Feature ID| Description |
-| ----------- | ----------- |
-| gfe_1, gfe_2 | Mean and variance of frame-level features as indicated in Figure 1, extracted before ReLU and BatchNorm layer.
-| pool | Pooled statistics (mean and variance) before the bottleneck speaker embedding layer, extracted before ReLU layer.
-| attention | Same as the pooled statistics but with the attention weights applied.
-| embedding | The standard ECAPA2 speaker embedding.
 <!--
 The following table describes the available features:

 # ECAPA2 Speaker Embedding Extractor
+ECAPA2 is a hybrid neural network architecture and training strategy for generating robust speaker embeddings.
+The provided pre-trained model has an easy-to-use API to extract speaker embeddings and other hierarchical features.
+The main purpose of this model is to provide an easy method to extract state-of-the-art speaker embeddings and other features for downstream tasks.
+The speaker embeddings are recommended for tasks which rely directly on the speaker identificatation (e.g. speaker verification and speaker diarization).
+The hierarchical features are most useful for tasks capturing intra-speaker variance (e.g. emotion recognition and speaker profiling) and prove complimentary with the speaker embedding in our experience.
+See the original ECAPA2 paper for more details about the architecture and employed training strategy.
+See our speaker profiling paper for an example usage of the hierarchical features.
 <!---
 ## Model Details
 The following table describes the available features:
+| Feature ID| Dimension | Description |
+| ----------- | ----------- | ----------- |
+| gfe_1, gfe_2 | 2048 | Mean and variance of frame-level features as indicated in Figure 1, extracted before ReLU and BatchNorm layer.
+| pool | 3072 | Pooled statistics (mean and variance) before the bottleneck speaker embedding layer, extracted before ReLU layer.
+| attention | 3072 | Same as the pooled statistics but with the attention weights applied.
+| embedding | 192 | The standard ECAPA2 speaker embedding.
 <!--
 The following table describes the available features: