language: fr
license: mit
library_name: transformers
tags:
- audio
- audio-to-audio
- speech
datasets:
- Cnam-LMSSC/vibravox
model-index:
- name: EBEN(M=4,P=4,Q=4)
results:
- task:
name: Bandwidth Extension
type: speech-enhancement
dataset:
name: Vibravox["throat_microphone"]
type: Cnam-LMSSC/vibravox
args: fr
metrics:
- name: Test STOI, in-domain training
type: stoi
value: 0.8338
- name: Test Noresqa-MOS, in-domain training
type: n-mos
value: 3.862
Model Card
- Developed by: Cnam-LMSSC
- Model type: EBEN (see publication)
- Language: French
- License: MIT
- Finetuned dataset:
speech_clean
subset of Cnam-LMSSC/vibravox - Samplerate for usage: 16kHz
Overview
This bandwidth extension model is trained on one specific body conduction sensor data from the Vibravox dataset. The model is designed to to enhance the audio quality of body-conducted captured speech, by denoising and regenerating mid and high frequencies from low frequency content only.
Disclaimer
This model has been trained for specific non-conventional speech sensors and is intended to be used with in-domain data. Please be advised that using these models outside their intended sensor data may result in suboptimal performance.
Training procedure
Detailed instructions for reproducing the experiments are available on the jhauret/vibravox Github repository.
Inference script :
import torch, torchaudio
from vibravox import EBENGenerator
from datasets import load_dataset
model = EBENGenerator.from_pretrained("Cnam-LMSSC/EBEN_throat_microphone")
test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.throat_microphone"]["array"])
audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)
cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz)
enhanced_audio_16kHz = model(cut_audio_16kHz)
Link to other BWE models trained on other body conducted sensors :
An entry point to all audio bandwidth extension (BWE) models trained on different sensor data from the trained on different sensor data from the Vibravox dataset is available at https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_bwe_models.