emon-j commited on
Commit
8dd8320
·
verified ·
1 Parent(s): 4ade5d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -44
README.md CHANGED
@@ -20,51 +20,7 @@ tags:
20
 
21
  **Whisper Speaker Identification (WSI)** is a state-of-the-art speaker identification model designed for multilingual scenarios.The WSI model adapts OpenAI's Whisper encoder and fine-tunes it with a projection head using triplet loss-based metric learning. This approach enhances its ability to generate discriminative, language-agnostic speaker embeddings.WSI demonstrates state-of-the-art performance on multilingual datasets, achieving lower Equal Error Rates (EER) and higher F1 Scores compared to models such as **pyannote/wespeaker-voxceleb-resnet34-LM** and **speechbrain/spkrec-ecapa-voxceleb**.
22
 
23
- ## Installation
24
 
25
- Install the `whisper-speaker-id` library via pip:
26
-
27
- ```
28
- pip install whisper-speaker-id
29
- ```
30
-
31
- ## Usage
32
-
33
- The `wsi` library provides a simple interface to use the WSI model for embedding generation and speaker similarity tasks.
34
-
35
- ## Download the model from Huggingface
36
-
37
- [WSI Model on Hugging Face](https://huggingface.co/emon-j/WSI)
38
-
39
- ### Generate Speaker Embeddings
40
-
41
- ```python
42
- from whisper-speaker-id import load_model, process_single_audio
43
- model, feature_extractor = load_model(
44
- model_path_or_repo_id="emon-j/WSI",
45
- filename="wsi.pth"
46
- )
47
- # Process an audio file
48
- embedding = process_single_audio(model, feature_extractor, "path/to/audio.wav")
49
- print("Speaker Embedding:", embedding)
50
- ```
51
-
52
- ### Calculate Similarity Between Two Audio Files
53
-
54
- ```python
55
- from whisper-speaker-id import load_model, process_audio_pair
56
-
57
- model, feature_extractor = load_model(
58
- model_path_or_repo_id="emon-j/WSI",
59
- filename="wsi.pth"
60
- )
61
-
62
- # Compute similarity between two audio files
63
- similarity = process_audio_pair(
64
- model, feature_extractor, "path/to/audio1.wav", "path/to/audio2.wav"
65
- )
66
- print("Similarity Score:", similarity)
67
- ```
68
 
69
  ### Cite This Work
70
 
 
20
 
21
  **Whisper Speaker Identification (WSI)** is a state-of-the-art speaker identification model designed for multilingual scenarios.The WSI model adapts OpenAI's Whisper encoder and fine-tunes it with a projection head using triplet loss-based metric learning. This approach enhances its ability to generate discriminative, language-agnostic speaker embeddings.WSI demonstrates state-of-the-art performance on multilingual datasets, achieving lower Equal Error Rates (EER) and higher F1 Scores compared to models such as **pyannote/wespeaker-voxceleb-resnet34-LM** and **speechbrain/spkrec-ecapa-voxceleb**.
22
 
 
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ### Cite This Work
26