emon-j commited on
Commit
bb66f96
·
verified ·
1 Parent(s): 8ea2a21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -3
README.md CHANGED
@@ -1,3 +1,73 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ datasets:
4
+ - voice-is-cool/voxtube
5
+ base_model:
6
+ - openai/whisper-tiny
7
+ library_name: transformers
8
+ tags:
9
+ - speaker-verification
10
+ - voice
11
+ - audio
12
+ - speaker-recognition
13
+ - speaker-embedding
14
+ - speaker-identification
15
+ - speaker
16
+ - whisper
17
+ - voxtube
18
+ ---
19
+ # Whisper Speaker Identification (WSI)
20
+
21
+ **Whisper Speaker Identification (WSI)** is a state-of-the-art speaker identification model designed for multilingual scenarios.The WSI model adapts OpenAI's Whisper encoder and fine-tunes it with a projection head using triplet loss-based metric learning. This approach enhances its ability to generate discriminative, language-agnostic speaker embeddings.WSI demonstrates state-of-the-art performance on multilingual datasets, achieving lower Equal Error Rates (EER) and higher F1 Scores compared to models such as **pyannote/wespeaker-voxceleb-resnet34-LM** and **speechbrain/spkrec-ecapa-voxceleb**.
22
+
23
+ ## Installation
24
+
25
+ Install the `whisper-speaker-id` library via pip:
26
+
27
+ ```
28
+ pip install whisper-speaker-id
29
+ ```
30
+
31
+ ## Usage
32
+
33
+ The `wsi` library provides a simple interface to use the WSI model for embedding generation and speaker similarity tasks.
34
+
35
+ ## Download the model from Huggingface
36
+
37
+ [WSI Model on Hugging Face](https://huggingface.co/emon-j/WSI)
38
+
39
+ ### Generate Speaker Embeddings
40
+
41
+ ```python
42
+ from wsi import load_model, process_single_audio
43
+
44
+ # Load the model and feature extractor
45
+ model, feature_extractor = load_model("path/to/wsi_model.pth")
46
+
47
+ # Generate embeddings for an audio file
48
+ embedding = process_single_audio(model, feature_extractor, "path/to/audio.wav")
49
+ print("Speaker Embedding:", embedding)
50
+ ```
51
+
52
+ ### Calculate Similarity Between Two Audio Files
53
+
54
+ ```python
55
+ from wsi import load_model, process_audio_pair
56
+
57
+ # Load the model and feature extractor
58
+ model, feature_extractor = load_model("path/to/wsi_model.pth")
59
+
60
+ # Compute similarity between two audio files
61
+ similarity = process_audio_pair(
62
+ model, feature_extractor, "path/to/audio1.wav", "path/to/audio2.wav"
63
+ )
64
+ print("Similarity Score:", similarity)
65
+ ```
66
+
67
+ ### Cite This Work
68
+
69
+ Comming Soon!
70
+
71
+ ### License
72
+
73
+ This project is licensed under the CC BY-NC-SA 4.0 License.