100 on HellaSwag benchmark
#1
by
TNTOutburst
- opened
Hello, I wanted to check with you to see if there could have been a contamination issue faced with this model due to it getting a 100 on the HellaSwag benchmark.
Hello, thanks for taking an interest in the model.
The correct benchmark result on Hellaswag (acc_norm) is 77, per our local evaluation. We have been in touch with Huggingface for this issue in benchmark numbers.
The correct eval benchmarks checked locally, are as follows:
MMLU: 52
ARC: 55
HellaSwag: 77
TruthfulQA: 38
Winogrande: 70
GSM8K: 2
ceadar-ie
changed discussion status to
closed