JohnnyBoy00 commited on
Commit
152c62f
·
1 Parent(s): eddb52f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -78,14 +78,14 @@ The following hyperparameters were used during training:
78
 
79
  ## Evaluation results
80
 
81
- The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
82
 
83
  The following results were achieved.
84
 
85
- | Split | SacreBLEU | ROUGE | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
86
- | --------------------- | :-------: | :---: | :----: | :-------: | :------: | :---------: | :------: |
87
- | test_unseen_answers | 36.0 | 49.1 | 60.8 | 69.5 | 76.0 | 73.0 | 53.4 |
88
- | test_unseen_questions | 2.4 | 20.1 | 28.5 | 36.6 | 51.6 | 41.0 | 27.9 |
89
 
90
 
91
  The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.
 
78
 
79
  ## Evaluation results
80
 
81
+ The generated feedback was evaluated through means of the [SacreBLEU](https://huggingface.co/spaces/evaluate-metric/sacrebleu), [ROUGE-2](https://huggingface.co/spaces/evaluate-metric/rouge), [METEOR](https://huggingface.co/spaces/evaluate-metric/meteor), [BERTScore](https://huggingface.co/spaces/evaluate-metric/bertscore) metrics from HuggingFace, while the [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) scores from scikit-learn where used for evaluation of the labels.
82
 
83
  The following results were achieved.
84
 
85
+ | Split | SacreBLEU | ROUGE-2 | METEOR | BERTscore | Accuracy | Weighted F1 | Macro F1 |
86
+ | --------------------- | :-------: | :-----: | :----: | :-------: | :------: | :---------: | :------: |
87
+ | test_unseen_answers | 36.0 | 49.1 | 60.8 | 69.5 | 76.0 | 73.0 | 53.4 |
88
+ | test_unseen_questions | 2.4 | 20.1 | 28.5 | 36.6 | 51.6 | 41.0 | 27.9 |
89
 
90
 
91
  The script used to compute these metrics and perform evaluation can be found in the `evaluation.py` file in this repository.