Spaces:
Running
on
CPU Upgrade
Feature suggestion: average of selected (rather than all) columns
I'd suggest providing an option for a column which is averaged over the benchmarks the user selects rather than every single metric in the table. Right now the average seems very sensitive to differences in individual metrics (namely DROP, where the wide variance in value can affect overall model standings pretty drastically) and having the ability to include and exclude these metrics would help tailor it more toward user preference. Ideally, this could be an optional toggle or a separate column altogether.
Agreed and I hope they fix whatever is going wrong, though in addition to that, I have certain metrics I tend to put more weight into than others (I enjoy using LLMs for creative writing, and Hellaswag and Winogrande are typically my two gotos since they're more focused on logical comprehension specifically, though different people may value different benchmarks more strongly, including the more knowledge-based ones like, e.g., MMLU, potentially more technical ones like GSM8K). Having the ability to sort according to user preference would be incredibly helpful, especially if more benchmarks get added in the future.
Hi!
We removed DROP, and explained why here! Thank you all for raising interesting points in the discussions, it allowed to start our investigation :)