SeaBenSea commited on
Commit
b58192b
·
verified ·
1 Parent(s): a53ae9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -1,3 +1,107 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: tr
3
+ datasets:
4
+ - TurEV
5
+ tags:
6
+ - audio
7
+ - speech
8
+ - speech-emotion-recognition
9
+ license: apache-2.0
10
+ ---
11
+
12
+ # Emotion Recognition in Turkish Speech using HuBERT
13
+ This HuBERT model is trained on [TurEV-DB](https://github.com/Xeonen/TurEV-DB) to achieve speech emotion recognition (SER) in Turkish.
14
+
15
+ ## How to use
16
+
17
+ ### Requirements
18
+
19
+ ```bash
20
+ # requirement packages
21
+ !pip install git+https://github.com/huggingface/datasets.git
22
+ !pip install git+https://github.com/huggingface/transformers.git
23
+ !pip install torchaudio
24
+ !pip install librosa
25
+ ```
26
+
27
+ ```bash
28
+ !git clone https://github.com/SeaBenSea/HuBERT-SER.git
29
+ ```
30
+
31
+ ### Prediction
32
+
33
+ ```python
34
+ import sys
35
+ sys.path.insert(1, './HuBERT-SER/')
36
+ import torch
37
+ import torch.nn as nn
38
+ import torch.nn.functional as F
39
+ import torchaudio
40
+ from transformers import AutoConfig, Wav2Vec2FeatureExtractor
41
+ from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification
42
+ ```
43
+
44
+ ```python
45
+ model_name_or_path = "SeaBenSea/hubert-large-turkish-speech-emotion-recognition"
46
+
47
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
48
+ config = AutoConfig.from_pretrained(model_name_or_path)
49
+ feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
50
+ sampling_rate = feature_extractor.sampling_rate
51
+
52
+ model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)
53
+ ```
54
+
55
+ ```python
56
+ def speech_file_to_array_fn(path, sampling_rate):
57
+ speech_array, _sampling_rate = torchaudio.load(path)
58
+ resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
59
+ speech = resampler(speech_array).squeeze().numpy()
60
+ return speech
61
+
62
+
63
+ def predict(path, sampling_rate):
64
+ speech = speech_file_to_array_fn(path, sampling_rate)
65
+ inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
66
+ inputs = {key: inputs[key].to(device) for key in inputs}
67
+
68
+ with torch.no_grad():
69
+ logits = model(**inputs).logits
70
+
71
+ scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
72
+ outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
73
+ enumerate(scores)]
74
+ return outputs
75
+ ```
76
+
77
+ ```python
78
+ path = "../dataset/TurEV/Angry/1157_kz_acik.wav"
79
+ outputs = predict(path, sampling_rate)
80
+ outputs
81
+ ```
82
+
83
+ ```bash
84
+ [
85
+ {'Emotion': 'Angry', 'Score': '99.8%'},
86
+ {'Emotion': 'Calm', 'Score': '0.0%'},
87
+ {'Emotion': 'Happy', 'Score': '0.1%'},
88
+ {'Emotion': 'Sad', 'Score': '0.1%'}
89
+ ]
90
+ ```
91
+
92
+
93
+ ## Evaluation
94
+ The following tables summarize the scores obtained by model overall and per each class.
95
+
96
+
97
+ | Emotions | precision | recall | f1-score | accuracy |
98
+ |:---------:|:---------:|:------:|:--------:|:--------:|
99
+ | Angry | 0.97 | 0.99 | 0.98 | |
100
+ | Calm | 0.89 | 0.95 | 0.92 | |
101
+ | Happy | 0.98 | 0.93 | 0.95 | |
102
+ | Sad | 0.97 | 0.93 | 0.95 | |
103
+ | | | | Overal | 0.95 |
104
+
105
+
106
+ ## Questions?
107
+ Post a Github issue from [HERE](https://github.com/SeaBenSea/HuBERT-SER).