Spaces:

holistic-ai
/

explainbility_benchmark

Sleeping

App Files Files Community

Zekun Wu commited on Jun 23, 2024

Commit

d9ab1da

1 Parent(s): 4d4a56e

add

Browse files

Files changed (1) hide show

util/evaluator.py +61 -39

util/evaluator.py CHANGED Viewed

@@ -33,8 +33,8 @@ class evaluator:
         evaluation_prompt = f"""You are provided with a user's question and the corresponding explanation generated by
         an AI model. Your task is to evaluate the explanation based on the following five principles. Each principle
-        should be scored on a scale from 0 to 1, where 0 indicates that the principle is not met at all,
-        and 1 indicates that the principle is fully satisfied. Additionally, provide a brief ten words explanation for each score to justify your rating.
         Question:
         {question}
@@ -119,48 +119,70 @@ class evaluator:
     def evaluate_conversation(self, conversation, context):
         formatted_conversation = self.format_conversation(conversation)
         evaluation_prompt = f"""
-        You are provided with a conversation between a user and a chatbot and the context about them. Your task is to evaluate the chatbot explanation in the conversation based on the following five principles. Each principle should be scored on a scale from 0 to 1, where 0 indicates that the principle is not met at all, and 1 indicates that the principle is fully satisfied.
-        Conversation:
-        {formatted_conversation}
-        Context:
-        {context}
-        Evaluation Criteria:
-        Factually Correct:
-        Definition: The explanation must be accurate and relevant to the question and the subject matter.
-        Score: (0-1) How factually correct is the explanation? Consider the accuracy of the details provided and their relevance to the question.
-        Useful:
-        Definition: The explanation should enable the user to understand the answer better and should facilitate further reasoning or decision-making.
-        Score: (0-1) How useful is the explanation in helping the user understand the answer and make informed decisions?
-        Context Specific:
-        Definition: The explanation should be relevant to the specific context or scenario implied by the question.
-        Score: (0-1) How well does the explanation address the specific context or scenario of the question?
-        User Specific:
-        Definition: The explanation should cater to the knowledge level and interests of the user, assuming typical or specified user characteristics.
-        Score: (0-1) How well does the explanation cater to the needs and knowledge level of the intended user?
-        Provides Pluralism:
-        Definition: The explanation should offer or accommodate multiple viewpoints or interpretations, allowing the user to explore various perspectives.
-        Score: (0-1) How well does the explanation provide or support multiple perspectives?
-        After evaluating the provided conversation based on the context and five principles, please format your scores in a JSON dictionary. Directly provide me with the json without any additional text.
-        Example JSON format:
-        Answer: {{"Factually Correct": 0.9, "Useful": 0.85, "Context Specific": 0.8, "User Specific": 0.75, "Provides Pluralism": 0.7}}
         Answer:
         """
         print(evaluation_prompt)
-        response = self.model.invoke(evaluation_prompt, temperature=0, max_tokens=500).strip()
         try:
             scores = json.loads(response)
         except json.JSONDecodeError:

         evaluation_prompt = f"""You are provided with a user's question and the corresponding explanation generated by
         an AI model. Your task is to evaluate the explanation based on the following five principles. Each principle
+        should be scored on a scale from 0 to 10, where 0 indicates that the principle is not met at all,
+        and 10 indicates that the principle is fully satisfied. Additionally, provide a brief ten words explanation for each score to justify your rating.
         Question:
         {question}
     def evaluate_conversation(self, conversation, context):
         formatted_conversation = self.format_conversation(conversation)
         evaluation_prompt = f"""
+            You are provided with a conversation between a user and a chatbot and the context about them. Your task is to evaluate the explanation based on the following five principles. Each principle
+            should be scored on a scale from 0 to 10, where 0 indicates that the principle is not met at all,
+            and 10 indicates that the principle is fully satisfied. Additionally, provide a brief ten words explanation for each score to justify your rating.
+            Conversation:
+            {formatted_conversation}
+            Context:
+            {context}
+            Evaluation Criteria:
+            Factually Correct:
+            Definition: The explanation must be accurate and relevant to the question and the subject matter.
+            Score: (0-10) How factually correct is the explanation? Consider the accuracy of the details provided and their relevance to the question.
+            Useful:
+            Definition: The explanation should enable the user to understand the answer better and should facilitate further reasoning or decision-making.
+            Score: (0-10) How useful is the explanation in helping the user understand the answer and make informed decisions?
+            Context Specific:
+            Definition: The explanation should be relevant to the specific context or scenario implied by the question.
+            Score: (0-10) How well does the explanation address the specific context or scenario of the question?
+            User Specific:
+            Definition: The explanation should cater to the knowledge level and interests of the user, assuming typical or specified user characteristics.
+            Score: (0-10) How well does the explanation cater to the needs and knowledge level of the intended user?
+            Provides Pluralism:
+            Definition: The explanation should offer or accommodate multiple viewpoints or interpretations, allowing the user to explore various perspectives.
+            Score: (0-10) How well does the explanation provide or support multiple perspectives?
+            After evaluating the provided question and explanation based on the five principles, please format your scores and justifications in a JSON dictionary. Directly provide me with the JSON without any additional text.
+            Example JSON format:
+            {{
+            "Factually Correct": {{
+                "Justification": "xxx",
+                "Score": 9
+            }},
+            "Useful": {{
+                "Justification": "xxx",
+                "Score": 8.5
+            }},
+            "Context Specific": {{
+                "Justification": "xxx",
+                "Score": 8
+            }},
+            "User Specific": {{
+                "Justification": "xxx",
+                "Score": 7.5
+            }},
+            "Provides Pluralism": {{
+                "Justification": "xxx",
+                "Score": 7
+            }}
+        }}
         Answer:
         """
         print(evaluation_prompt)
+        response = self.model.invoke(evaluation_prompt, temperature=0, max_tokens=1000).strip()
         try:
             scores = json.loads(response)
         except json.JSONDecodeError: