deepparag
/

Aeona

@@ -31,6 +31,21 @@ contact: [email protected]
 ## Training
   The Discord Messages Dataset simply dwarfs the other datasets, Hence the data sets are repeated.
   This leads to them covering each others issues!
 ## Usage
 Example:
 ```python

 ## Training
   The Discord Messages Dataset simply dwarfs the other datasets, Hence the data sets are repeated.
   This leads to them covering each others issues!
+## Evaluation
+Below is a comparison of Aeona vs. other baselines on the mixed dataset given above using automatic evaluation metrics.
+| Model | Perplexity
+|---|---|---
+| Seq2seq Baseline [3] | 29.8 |
+| Wolf et al. [5] | 16.3 |
+| GPT-2 baseline | 99.5 |
+| DialoGPT baseline | 56.6 |
+| DialoGPT finetuned | 11.4 |
+| PersonaGPT | 10.2 |
+| **Aeona** | **7.9** |
 ## Usage
 Example:
 ```python