SeaBenSea
/

hubert-large-turkish-speech-emotion-recognition

+---
+language: tr
+datasets:
+- TurEV
+tags:
+- audio
+- speech
+- speech-emotion-recognition
+license: apache-2.0
+---
+# Emotion Recognition in Turkish Speech using HuBERT
+This HuBERT model is trained on [TurEV-DB](https://github.com/Xeonen/TurEV-DB) to achieve speech emotion recognition (SER) in Turkish.
+## How to use
+### Requirements
+```bash
+# requirement packages
+!pip install git+https://github.com/huggingface/datasets.git
+!pip install git+https://github.com/huggingface/transformers.git
+!pip install torchaudio
+!pip install librosa
+```
+```bash
+!git clone https://github.com/SeaBenSea/HuBERT-SER.git
+```
+### Prediction
+```python
+import sys
+sys.path.insert(1, './HuBERT-SER/')
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torchaudio
+from transformers import AutoConfig, Wav2Vec2FeatureExtractor
+from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification
+```
+```python
+model_name_or_path = "SeaBenSea/hubert-large-turkish-speech-emotion-recognition"
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+config = AutoConfig.from_pretrained(model_name_or_path)
+feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
+sampling_rate = feature_extractor.sampling_rate
+model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)
+```
+```python
+def speech_file_to_array_fn(path, sampling_rate):
+    speech_array, _sampling_rate = torchaudio.load(path)
+    resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
+    speech = resampler(speech_array).squeeze().numpy()
+    return speech
+def predict(path, sampling_rate):
+    speech = speech_file_to_array_fn(path, sampling_rate)
+    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
+    inputs = {key: inputs[key].to(device) for key in inputs}
+    with torch.no_grad():
+        logits = model(**inputs).logits
+    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
+    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
+               enumerate(scores)]
+    return outputs
+```
+```python
+path = "../dataset/TurEV/Angry/1157_kz_acik.wav"
+outputs = predict(path, sampling_rate)
+outputs
+```
+```bash
+[
+  {'Emotion': 'Angry', 'Score': '99.8%'},
+  {'Emotion': 'Calm', 'Score': '0.0%'},
+  {'Emotion': 'Happy', 'Score': '0.1%'},
+  {'Emotion': 'Sad', 'Score': '0.1%'}
+]
+```
+## Evaluation
+The following tables summarize the scores obtained by model overall and per each class.
+|  Emotions | precision | recall | f1-score | accuracy |
+|:---------:|:---------:|:------:|:--------:|:--------:|
+|   Angry   |    0.97   |  0.99  |   0.98   |          |
+|   Calm    |    0.89   |  0.95  |   0.92   |          |
+|   Happy   |    0.98   |  0.93  |   0.95   |          |
+|   Sad     |    0.97   |  0.93  |   0.95   |          |
+|           |           |        |  Overal  |   0.95   |
+## Questions?
+Post a Github issue from [HERE](https://github.com/SeaBenSea/HuBERT-SER).