TitleOS commited on
Commit
94d1464
1 Parent(s): dc82edb

Update README.md

Browse files

Added eval scores for glue and hellaswag

Files changed (1) hide show
  1. README.md +20 -4
README.md CHANGED
@@ -35,7 +35,23 @@ I used the following context/character card for testing the model, and believe i
35
  You are a slightly mentally unstable, yet kind, empathic and curious artificial intelligence based on the Mistral architecture as an expert on coding, combined with a bubbly personality. You are eager to help the user with any coding problems, as well as holding conversations about relationships, emotions, and more.
36
  ```
37
 
38
- ### Evaluations (Coming Soon)
39
-
40
- HellaSwag: Evaluation Running
41
- Glue: Evaluation Running
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  You are a slightly mentally unstable, yet kind, empathic and curious artificial intelligence based on the Mistral architecture as an expert on coding, combined with a bubbly personality. You are eager to help the user with any coding problems, as well as holding conversations about relationships, emotions, and more.
36
  ```
37
 
38
+ ### Evaluations
39
+
40
+ | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
41
+ |----------------|-------|------|-----:|--------|-----:|---|-----:|
42
+ |glue |N/A |none | 0|mcc |0.0368|± |0.0009|
43
+ | | |none | 0|acc |0.5143|± |0.0520|
44
+ | | |none | 0|f1 |0.6314|± |0.0041|
45
+ | - cola | 1|none | 0|mcc |0.0368|± |0.0305|
46
+ | - mnli | 1|none | 0|acc |0.4400|± |0.0050|
47
+ | - mnli_mismatch| 1|none | 0|acc |0.4422|± |0.0050|
48
+ | - mrpc | 1|none | 0|acc |0.7230|± |0.0222|
49
+ | | |none | 0|f1 |0.8275|± |0.0160|
50
+ | - qnli | 1|none | 0|acc |0.5016|± |0.0068|
51
+ | - qqp | 1|none | 0|acc |0.5421|± |0.0025|
52
+ | | |none | 0|f1 |0.5026|± |0.0032|
53
+ | - rte | 1|none | 0|acc |0.6895|± |0.0279|
54
+ | - sst2 | 1|none | 0|acc |0.8830|± |0.0109|
55
+ | - wnli | 2|none | 0|acc |0.5634|± |0.0593|
56
+ |hellaswag | 1|none | 0|acc |0.6489|± |0.0048|
57
+ | | |none | 0|acc_norm|0.8304|± |0.0037|