100 on HellaSwag benchmark

#1
by TNTOutburst - opened

Hello, I wanted to check with you to see if there could have been a contamination issue faced with this model due to it getting a 100 on the HellaSwag benchmark.

Hello, thanks for taking an interest in the model.

The correct benchmark result on Hellaswag (acc_norm) is 77, per our local evaluation. We have been in touch with Huggingface for this issue in benchmark numbers.

The correct eval benchmarks checked locally, are as follows:

MMLU: 52
ARC: 55
HellaSwag: 77
TruthfulQA: 38
Winogrande: 70
GSM8K: 2

ceadar-ie changed discussion status to closed

Sign up or log in to comment