--- license: cc-by-sa-4.0 datasets: - Ar4ikov/iemocap_audio_text_splitted language: - en - zh metrics: - f1 library_name: transformers pipeline_tag: audio-classification tags: - speech-emotion-recognition --- # Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on English and Chinese data from adult speakers. The model is trained on the training sets of [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D), [ESD](https://github.com/HLTSingapore/Emotional-Speech-Data), [IEMOCAP](https://sail.usc.edu/iemocap/iemocap_release.htm), and [TESS](https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess). When using this model, make sure that your speech input is sampled at 16kHz. The scripts used for training and evaluation can be found here: [https://github.com/HLTCHKUST/elderly_ser/tree/main](https://github.com/HLTCHKUST/elderly_ser/tree/main) ## Evaluation Results For the details (e.g., the statistics of `train`, `valid`, and `test` data), please refer to our paper on [arXiv](https://arxiv.org/abs/2306.14517). It also provides the model's speech emotion recognition performances on: English-All, Chinese-All, English-Elderly, Chinese-Elderly, English-Adults, Chinese-Adults. ## Citation Our paper will be published at INTERSPEECH 2023. In the meantime, you can find our paper on [arXiv](https://arxiv.org/abs/2306.14517). If you find our work useful, please consider citing our paper as follows: ``` @misc{cahyawijaya2023crosslingual, title={Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition}, author={Samuel Cahyawijaya and Holy Lovenia and Willy Chung and Rita Frieske and Zihan Liu and Pascale Fung}, year={2023}, eprint={2306.14517}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```