Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification
Abstract
Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.
Community
Let me know your thoughts and suggestions!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities (2024)
- A Multimodal Framework for the Assessment of the Schizophrenia Spectrum (2024)
- Self-Supervised Embeddings for Detecting Individual Symptoms of Depression (2024)
- Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts (2024)
- We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper