kianpaya commited on
Commit
046e707
·
verified ·
1 Parent(s): a230944

Upload 8 files

Browse files
Files changed (8) hide show
  1. LICENSE +21 -0
  2. README.md +52 -0
  3. analysis.ipynb +0 -0
  4. app.py +4 -0
  5. egpt.py +51 -0
  6. elit.py +35 -0
  7. etal.py +190 -0
  8. requirements.txt +12 -0
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Kian Paya
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MentalHealthGPT
2
+ ## Overview
3
+
4
+ **MentalHealthGPT** is an AI-powered application developed to assist mental health counselors by analyzing the tone of client conversations and generating responses that are sensitive to the emotional context. Combining advanced NLP models, including BERT for tone classification and GPT for response refinement, MentalHealthGPT aims to support mental health professionals in fostering empathetic and productive interactions with clients.
5
+
6
+ <p align="center">
7
+ <img src="Images/Legacy_App.jpg" width="600" alt="MentalHealthGPT Interface">
8
+ </p>
9
+
10
+ <p align="center"><i>MentalHealthGPT app interface: Recognizes conversation tone and provides counseling guidance.</i></p>
11
+
12
+ ## Key Features
13
+
14
+ - **Tone Classification**: The app uses a BERT-based model to assess the emotional tone in user input, categorizing it as empathy, frustration, supportiveness, or other emotional states. This allows counselors to gain insight into the client’s emotional state and tailor their responses accordingly.
15
+
16
+ - **GPT-Based Response Generation**: Leveraging OpenAI's API, the app fine-tunes GPT responses based on the tone identified by BERT. This two-stage process ensures that the responses are contextually appropriate, supportive, and reflective of the client’s needs, enhancing the counselor-client interaction.
17
+
18
+ - **User-Friendly Interface**: MentalHealthGPT is built with Streamlit, offering a straightforward and interactive interface. Counselors can input text, analyze tone, and view responses generated by GPT all within the same platform, making it accessible even for non-technical users.
19
+
20
+ ## Hosting
21
+
22
+ The application is hosted on **Hugging Face Spaces**, which provides a scalable, secure, and user-friendly environment for real-time interactions. Hosting on Hugging Face Spaces makes the tool accessible from any browser without requiring local installations, providing flexibility and ease of use for mental health professionals.
23
+
24
+ ---
25
+
26
+ ## Purpose and Impact
27
+
28
+ The purpose of MentalHealthGPT is to support mental health counselors by:
29
+ - **Improving Emotional Awareness**: Helping counselors identify and understand the client’s emotional tone more accurately.
30
+ - **Enhancing Communication**: Offering emotionally aligned responses that build rapport and foster understanding.
31
+ - **Saving Time and Effort**: Providing an efficient tool to assist counselors in real-time, allowing them to focus more on interaction quality.
32
+
33
+ ## Future Considerations
34
+
35
+ - **Broader Emotion Spectrum**: Expanding the emotion classification model to recognize a wider range of emotions, such as optimism, anxiety, or neutrality, would increase the app’s relevance across diverse counseling sessions.
36
+ - **Privacy and Security Enhancements**: As MentalHealthGPT processes sensitive data, implementing stricter privacy controls and secure data handling practices would enhance trustworthiness.
37
+ - **Multilingual Support**: Introducing multilingual models to accommodate clients who may not speak English, thereby making the application useful for a wider global audience.
38
+ - **Fine-Tuning with Mental Health-Specific Data**: Leveraging datasets specifically related to mental health interactions could improve response quality and relevance.
39
+
40
+ ## Challenges
41
+
42
+ - **Accuracy in Tone Detection**: Tone detection is complex, and achieving accurate classification, especially in nuanced or ambiguous text, remains a challenge. Misclassification can lead to inappropriate or ineffective responses.
43
+ - **Dependency on External APIs**: Reliance on OpenAI’s API for GPT-based response generation can introduce latency and may be cost-prohibitive with large-scale usage.
44
+ - **Ethical Considerations**: As an AI-driven tool in mental health, there are ethical considerations around transparency, bias in model responses, and the potential impact of machine-generated responses on clients’ mental health.
45
+
46
+ ---
47
+
48
+ ## Conclusion
49
+
50
+ **MentalHealthGPT** combines AI-driven tone analysis with responsive, context-aware text generation to empower counselors with better communication tools. By leveraging both classification and fine-tuning, it supports mental health professionals in creating empathetic and effective interactions, making it a valuable tool for counseling environments. Hosted on Hugging Face Spaces, it provides an accessible platform for professionals seeking AI-enhanced support in their daily interactions.
51
+
52
+ ---
analysis.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
app.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ from elit import elit
2
+
3
+ if __name__ == '__main__':
4
+ elit()
egpt.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import warnings
2
+ warnings.filterwarnings("ignore")
3
+ import torchvision
4
+ torchvision.disable_beta_transforms_warning()
5
+
6
+
7
+ import openai
8
+ import pandas as pd
9
+ from transformers import BertTokenizer
10
+ from sklearn.metrics.pairwise import cosine_similarity
11
+ from sentence_transformers import SentenceTransformer
12
+
13
+
14
+ class egpt:
15
+ def __init__(self, apiKey, modelName='gpt-4-turbo', embeddingModel='all-MiniLM-L6-v2', datasetPath='hf://datasets/Amod/mental_health_counseling_conversations/combined_dataset.json'):
16
+ openai.api_key = apiKey
17
+ self.modelName = modelName
18
+ self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
19
+ self.embeddingModel = SentenceTransformer(embeddingModel)
20
+ self.dataset = self.loadDataset(datasetPath)
21
+ self.knowledgeBase = self.createKnowledgeBase()
22
+
23
+ def loadDataset(self, path):
24
+ dataset = pd.read_json(path, lines=True)
25
+ return dataset[['Context', 'Response']].values.tolist()
26
+
27
+ def createKnowledgeBase(self):
28
+ knowledgeBase = []
29
+ for context, response in self.dataset:
30
+ embedding = self.embeddingModel.encode(context)
31
+ knowledgeBase.append((embedding, response))
32
+ return knowledgeBase
33
+
34
+ def getSimilarResponse(self, userContext):
35
+ userEmbedding = self.embeddingModel.encode(userContext)
36
+ similarities = [cosine_similarity([userEmbedding], [kbEmbedding])[0][0] for kbEmbedding, _ in self.knowledgeBase]
37
+ bestMatchIdx = similarities.index(max(similarities))
38
+ _, bestResponse = self.knowledgeBase[bestMatchIdx]
39
+ return bestResponse
40
+
41
+ def queryGpt(self, context):
42
+ response = openai.ChatCompletion.create(
43
+ model=self.modelName,
44
+ messages=[{'role': 'user', 'content': context}]
45
+ )
46
+ return response.choices[0].message['content']
47
+
48
+ def respond(self, userContext):
49
+ similarResponse = self.getSimilarResponse(userContext)
50
+ prompt = f'Given the following context and a similar response, please respond appropriately:\n\nContext: {userContext}\n\nSimilar Response: {similarResponse}'
51
+ return self.queryGpt(prompt)
elit.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from etal import *
3
+ from egpt import *
4
+
5
+
6
+ class elit:
7
+ def __init__(self):
8
+ st.set_page_config(page_title='Legacy - Mental Health', layout='centered')
9
+ self.displayHeader()
10
+ self.modelChoice = st.radio('Choose a Model', ['etal', 'egpt'])
11
+ if self.modelChoice == 'etal':
12
+ self.displayEtalPanel()
13
+ elif self.modelChoice == 'egpt':
14
+ self.displayEgptPanel()
15
+
16
+ def displayHeader(self):
17
+ st.title('Legacy - Mental Health')
18
+ st.markdown('[Open Google Colab Notebook for Analysis](https://colab.research.google.com/drive/1UVrgohHSifjsw2OVP8j8EfDs_qeTOkCn?usp=sharing)')
19
+
20
+ def displayEtalPanel(self):
21
+ st.subheader('etal Model - Usage & Response')
22
+ inputText = st.text_area('Enter Context for etal', placeholder='Type the context here...')
23
+ if st.button('Get Response from etal'):
24
+ model = etal()
25
+ response = model.predict(inputText)
26
+ st.write('Response:', response)
27
+
28
+ def displayEgptPanel(self):
29
+ st.subheader('egpt Model - Usage & Response')
30
+ inputText = st.text_area('Enter Context for egpt', placeholder='Type the context here...')
31
+ if st.button('Get Response from egpt'):
32
+ apiKey = st.secrets['openai_api_key']
33
+ model = egpt(apiKey)
34
+ response = model.respond(inputText)
35
+ st.write('Response:', response)
etal.py ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import warnings
2
+ warnings.filterwarnings('ignore')
3
+ warnings.filterwarnings("ignore", category=UserWarning)
4
+ import torchvision
5
+ torchvision.disable_beta_transforms_warning()
6
+
7
+
8
+ import os
9
+ import re
10
+ from transformers import BertTokenizer, BertForSequenceClassification
11
+ from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
12
+ from sklearn.model_selection import train_test_split
13
+ from sklearn.metrics import classification_report
14
+ import torch
15
+ import torch.nn as nn
16
+ import numpy as np
17
+ from alive_progress import alive_bar
18
+
19
+
20
+ class Preprocessor:
21
+ def __init__(self, modelName='bert-base-uncased'):
22
+ self.tokenizer = BertTokenizer.from_pretrained(modelName)
23
+ self.labelMap = {
24
+ 0: 'Anxiety',
25
+ 1: 'Depression',
26
+ 2: 'Stress',
27
+ 3: 'Happiness',
28
+ 4: 'Relationship Issues',
29
+ 5: 'Self-Harm',
30
+ 6: 'Substance Abuse',
31
+ 7: 'Trauma',
32
+ 8: 'Obsessive Compulsive Disorder',
33
+ 9: 'Eating Disorders',
34
+ 10: 'Grief',
35
+ 11: 'Phobias',
36
+ 12: 'Bipolar Disorder',
37
+ 13: 'Post-Traumatic Stress Disorder',
38
+ 14: 'Mental Fatigue',
39
+ 15: 'Mood Swings',
40
+ 16: 'Anger Management',
41
+ 17: 'Social Isolation',
42
+ 18: 'Perfectionism',
43
+ 19: 'Low Self-Esteem',
44
+ 20: 'Family Issues'
45
+ }
46
+
47
+ self.keywords = {
48
+ 'anxiety': 0,
49
+ 'depressed': 1,
50
+ 'sad': 1,
51
+ 'stress': 2,
52
+ 'happy': 3,
53
+ 'relationship': 4,
54
+ 'self-harm': 5,
55
+ 'substance': 6,
56
+ 'trauma': 7,
57
+ 'ocd': 8,
58
+ 'eating': 9,
59
+ 'grief': 10,
60
+ 'phobia': 11,
61
+ 'bipolar': 12,
62
+ 'ptsd': 13,
63
+ 'fatigue': 14,
64
+ 'mood': 15,
65
+ 'anger': 16,
66
+ 'isolated': 17,
67
+ 'perfectionism': 18,
68
+ 'self-esteem': 19,
69
+ 'family': 20
70
+ }
71
+
72
+ def tokenizeText(self, text, maxLength=128):
73
+ return self.tokenizer(
74
+ text,
75
+ padding='max_length',
76
+ truncation=True,
77
+ max_length=maxLength,
78
+ return_tensors='pt'
79
+ )
80
+
81
+ def preprocessDataset(self, texts):
82
+ inputIds, attentionMasks = [], []
83
+ for text in texts:
84
+ encodedDict = self.tokenizeText(text)
85
+ inputIds.append(encodedDict['input_ids'])
86
+ attentionMasks.append(encodedDict['attention_mask'])
87
+ return torch.cat(inputIds, dim=0), torch.cat(attentionMasks, dim=0)
88
+
89
+ def labelContext(self, context):
90
+ context = context.lower()
91
+ pattern = r'\b(?:' + '|'.join(re.escape(keyword) for keyword in self.keywords.keys()) + r')\b'
92
+ match = re.search(pattern, context)
93
+ return self.keywords[match.group(0)] if match else None
94
+
95
+
96
+ class etal(Preprocessor):
97
+ def __init__(self, modelName='bert-base-uncased', numLabels=21):
98
+ super().__init__(modelName)
99
+ self.model = BertForSequenceClassification.from_pretrained(modelName, num_labels=numLabels)
100
+ self.criterion = nn.CrossEntropyLoss()
101
+
102
+ def train(self, texts, labels, epochs=3, batchSize=8, learningRate=2e-5):
103
+ inputIds, attentionMasks = self.preprocessDataset(texts)
104
+ labels = torch.tensor(labels, dtype=torch.long)
105
+
106
+ trainIdx, valIdx = train_test_split(np.arange(len(labels)), test_size=0.2, random_state=42)
107
+ trainIds, valIds = inputIds[trainIdx], inputIds[valIdx]
108
+ trainMasks, valMasks = attentionMasks[trainIdx], attentionMasks[valIdx]
109
+ trainLabels, valLabels = labels[trainIdx], labels[valIdx]
110
+
111
+ trainData = torch.utils.data.TensorDataset(trainIds, trainMasks, trainLabels)
112
+ valData = torch.utils.data.TensorDataset(valIds, valMasks, valLabels)
113
+ trainLoader = torch.utils.data.DataLoader(trainData, batch_size=batchSize, shuffle=True)
114
+ valLoader = torch.utils.data.DataLoader(valData, batch_size=batchSize)
115
+
116
+ optimizer = torch.optim.AdamW(self.model.parameters(), lr=learningRate)
117
+ bestValLoss = float('inf')
118
+
119
+ with alive_bar(epochs, title='Training Progress') as bar:
120
+ for epoch in range(epochs):
121
+ totalLoss = 0
122
+ self.model.train()
123
+ for i, batch in enumerate(trainLoader):
124
+ batchIds, batchMasks, batchLabels = batch
125
+ self.model.zero_grad()
126
+
127
+ outputs = self.model(input_ids=batchIds, attention_mask=batchMasks, labels=batchLabels)
128
+ loss = outputs.loss
129
+ totalLoss += loss.item()
130
+ loss.backward()
131
+ optimizer.step()
132
+
133
+ print(f"Epoch {epoch + 1}/{epochs}, Batch {i + 1}/{len(trainLoader)}, Loss: {loss.item()}")
134
+
135
+ avgTrainLoss = totalLoss / len(trainLoader)
136
+ valLoss = self.evaluate(valLoader)
137
+ if valLoss < bestValLoss:
138
+ bestValLoss = valLoss
139
+ self.save('models', f'e{epoch}l{valLoss}.pt')
140
+ print(f"Model State Dict Saved at: {os.path.join(os.getcwd(), 'models', f'e{epoch}l{valLoss}.pt')}")
141
+ print(f'Epoch {epoch + 1}, Train Loss: {avgTrainLoss}, Validation Loss: {valLoss}')
142
+ bar()
143
+
144
+ def evaluate(self, dataLoader):
145
+ self.model.eval()
146
+ predictions, trueLabels = [], []
147
+ totalLoss = 0
148
+ with torch.no_grad():
149
+ for batch in dataLoader:
150
+ batchIds, batchMasks, batchLabels = batch
151
+ outputs = self.model(input_ids=batchIds, attention_mask=batchMasks, labels=batchLabels)
152
+ logits = outputs.logits
153
+ loss = outputs.loss
154
+ totalLoss += loss.item()
155
+ predictions.extend(torch.argmax(logits, axis=1).cpu().numpy())
156
+ trueLabels.extend(batchLabels.cpu().numpy())
157
+ print(classification_report(trueLabels, predictions))
158
+ return totalLoss / len(dataLoader)
159
+
160
+ def predict(self, text):
161
+ self.model.eval()
162
+ tokens = self.tokenizeText(text)
163
+ with torch.no_grad():
164
+ outputs = self.model(input_ids=tokens['input_ids'], attention_mask=tokens['attention_mask'])
165
+ prediction = torch.argmax(outputs.logits, axis=1).item()
166
+ return self.labelMap.get(prediction)
167
+
168
+ def save(self, folder, filename):
169
+ if not os.path.exists(folder):
170
+ os.makedirs(folder)
171
+ filepath = os.path.join(folder, filename)
172
+ torch.save(self.model.state_dict(), filepath)
173
+
174
+ def load(self, filePath, best = True):
175
+ if best:
176
+ modelFiles = [f for f in os.listdir(filePath) if f.endswith('.pt')]
177
+ if not modelFiles:
178
+ print('No model files found in the specified folder.')
179
+ return
180
+
181
+ modelFiles.sort(key=lambda x: (int(x.split('e')[1].split('l')[0]), float(x.split('l')[1].split('.')[0])))
182
+
183
+ bestModelFile = modelFiles[-1]
184
+ modelPath = os.path.join(filePath, bestModelFile)
185
+ self.model.load_state_dict(torch.load(modelPath))
186
+ else:
187
+ self.model.load_state_dict(torch.load(filePath))
188
+
189
+ print(f'Loaded model state dict')
190
+ self.model.eval()
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # pip install -r requirements.txt
2
+
3
+ torch
4
+ torchvision
5
+ transformers
6
+ scikit-learn
7
+ numpy
8
+ alive-progress
9
+ openai==0.28
10
+ pandas
11
+ sentence-transformers
12
+ streamlit