Jenthe commited on
Commit
ab62acf
·
1 Parent(s): ebbe4a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -4,7 +4,16 @@ license: cc-by-nc-4.0
4
 
5
  # ECAPA2 Speaker Embedding Extractor
6
 
7
- ECAPA2 is a hybrid neural network architecture and training strategy for speaker recognition. The provided model is pre-trained and has an easy-to-use API to extract speaker embeddings.
 
 
 
 
 
 
 
 
 
8
 
9
  <!---
10
  ## Model Details
@@ -83,12 +92,12 @@ feature = ecapa2_model(audio, label='embedding|gfe_1|pool')
83
 
84
  The following table describes the available features:
85
 
86
- | Feature ID| Description |
87
- | ----------- | ----------- |
88
- | gfe_1, gfe_2 | Mean and variance of frame-level features as indicated in Figure 1, extracted before ReLU and BatchNorm layer.
89
- | pool | Pooled statistics (mean and variance) before the bottleneck speaker embedding layer, extracted before ReLU layer.
90
- | attention | Same as the pooled statistics but with the attention weights applied.
91
- | embedding | The standard ECAPA2 speaker embedding.
92
 
93
  <!--
94
  The following table describes the available features:
 
4
 
5
  # ECAPA2 Speaker Embedding Extractor
6
 
7
+ ECAPA2 is a hybrid neural network architecture and training strategy for generating robust speaker embeddings.
8
+ The provided pre-trained model has an easy-to-use API to extract speaker embeddings and other hierarchical features.
9
+
10
+ The main purpose of this model is to provide an easy method to extract state-of-the-art speaker embeddings and other features for downstream tasks.
11
+ The speaker embeddings are recommended for tasks which rely directly on the speaker identificatation (e.g. speaker verification and speaker diarization).
12
+ The hierarchical features are most useful for tasks capturing intra-speaker variance (e.g. emotion recognition and speaker profiling) and prove complimentary with the speaker embedding in our experience.
13
+
14
+ See the original ECAPA2 paper for more details about the architecture and employed training strategy.
15
+
16
+ See our speaker profiling paper for an example usage of the hierarchical features.
17
 
18
  <!---
19
  ## Model Details
 
92
 
93
  The following table describes the available features:
94
 
95
+ | Feature ID| Dimension | Description |
96
+ | ----------- | ----------- | ----------- |
97
+ | gfe_1, gfe_2 | 2048 | Mean and variance of frame-level features as indicated in Figure 1, extracted before ReLU and BatchNorm layer.
98
+ | pool | 3072 | Pooled statistics (mean and variance) before the bottleneck speaker embedding layer, extracted before ReLU layer.
99
+ | attention | 3072 | Same as the pooled statistics but with the attention weights applied.
100
+ | embedding | 192 | The standard ECAPA2 speaker embedding.
101
 
102
  <!--
103
  The following table describes the available features: