open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

Interpretation of result details?

by nicobuko - opened Jul 12, 2024

Discussion

nicobuko

Jul 12, 2024

•

edited Jul 12, 2024

Hello :)

I want to extract some example results of models for demonstration. I am currently struggling with the result details datasets (for example https://huggingface.co/datasets/open-llm-leaderboard-old/details_davidkim205__Rhea-72b-v0.5).

I want to extract the example, the corresponding choices, the predicted answer by a model and if the answer is true or false.

By looking for example on the file of https://huggingface.co/datasets/open-llm-leaderboard-old/details_davidkim205__Rhea-72b-v0.5/blob/main/2024-03-23T20-12-54.617185/details_harness%7Cwinogrande%7C5_2024-03-23T20-12-54.617185.parquet i am wondering why the choices field in the dataset is always empty. I have also seen this behaviour for example on ARC. Why are the choices not in there?

Is there any way to get the requested information out of those files?

Thank you in advance!

clefourrier

Open LLM Leaderboard Archive org Nov 15, 2024

Hi @nicobuko ,
Sorry, we completely missed issues open on the archived version!
Could be a parsing issue, we changed the format of saving between the v1 and the v2 to solve a couple of bugs. Tagging @alozowski who might have some time to investigate :)

alozowski

Nov 15, 2024

Hi @nicobuko ,

Thank you for your question and for providing the example! Really sorry that we missed it in this July

Regarding why the choices field is empty in datasets like the one you referenced (e.g., Winogrande or ARC), here's the reason:
At the time those results were generated, the choices field was not included for all evaluation types. This was a design decision in the earlier versions of the evaluation setup. Specifically, the required choice details were not explicitly stored in the dataset. Instead, the input_tokens field can be used to reconstruct the input text, including the missing choices. By using the same tokenizer that was used for encoding, you can decode the input_tokens to retrieve the original text

Feel free to send your models to our updated Leaderboard V2 and we will be happy to help you in the discussions there!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment