Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,16 @@ license: cc-by-nc-4.0
|
|
4 |
|
5 |
# ECAPA2 Speaker Embedding Extractor
|
6 |
|
7 |
-
ECAPA2 is a hybrid neural network architecture and training strategy for
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
<!---
|
10 |
## Model Details
|
@@ -83,12 +92,12 @@ feature = ecapa2_model(audio, label='embedding|gfe_1|pool')
|
|
83 |
|
84 |
The following table describes the available features:
|
85 |
|
86 |
-
| Feature ID| Description |
|
87 |
-
| ----------- | ----------- |
|
88 |
-
| gfe_1, gfe_2 | Mean and variance of frame-level features as indicated in Figure 1, extracted before ReLU and BatchNorm layer.
|
89 |
-
| pool | Pooled statistics (mean and variance) before the bottleneck speaker embedding layer, extracted before ReLU layer.
|
90 |
-
| attention | Same as the pooled statistics but with the attention weights applied.
|
91 |
-
| embedding | The standard ECAPA2 speaker embedding.
|
92 |
|
93 |
<!--
|
94 |
The following table describes the available features:
|
|
|
4 |
|
5 |
# ECAPA2 Speaker Embedding Extractor
|
6 |
|
7 |
+
ECAPA2 is a hybrid neural network architecture and training strategy for generating robust speaker embeddings.
|
8 |
+
The provided pre-trained model has an easy-to-use API to extract speaker embeddings and other hierarchical features.
|
9 |
+
|
10 |
+
The main purpose of this model is to provide an easy method to extract state-of-the-art speaker embeddings and other features for downstream tasks.
|
11 |
+
The speaker embeddings are recommended for tasks which rely directly on the speaker identificatation (e.g. speaker verification and speaker diarization).
|
12 |
+
The hierarchical features are most useful for tasks capturing intra-speaker variance (e.g. emotion recognition and speaker profiling) and prove complimentary with the speaker embedding in our experience.
|
13 |
+
|
14 |
+
See the original ECAPA2 paper for more details about the architecture and employed training strategy.
|
15 |
+
|
16 |
+
See our speaker profiling paper for an example usage of the hierarchical features.
|
17 |
|
18 |
<!---
|
19 |
## Model Details
|
|
|
92 |
|
93 |
The following table describes the available features:
|
94 |
|
95 |
+
| Feature ID| Dimension | Description |
|
96 |
+
| ----------- | ----------- | ----------- |
|
97 |
+
| gfe_1, gfe_2 | 2048 | Mean and variance of frame-level features as indicated in Figure 1, extracted before ReLU and BatchNorm layer.
|
98 |
+
| pool | 3072 | Pooled statistics (mean and variance) before the bottleneck speaker embedding layer, extracted before ReLU layer.
|
99 |
+
| attention | 3072 | Same as the pooled statistics but with the attention weights applied.
|
100 |
+
| embedding | 192 | The standard ECAPA2 speaker embedding.
|
101 |
|
102 |
<!--
|
103 |
The following table describes the available features:
|