Spaces:
Running
on
CPU Upgrade
Inclusion of non open LLMs for straightforward comparison
Although it is orthogonal to the objective of this space, including results of closed models from Google, OpenAI, Anthropic, Cohere etc on the same benchmarks would help users find open source LLM that are close enough to the closed LLM's for their particular use case. It would greatly reduce the time spent on experimentation
Thank You!
Second this, cause it's easier to distinguish something when I already have a reference for it. chatgpt 3.5 and chatgpt 4 are anchors that alot of people are likely to know.
Hi! We won't do this, as this is a leaderboard for Open models, both for philosophical reasons (openness is cool) and for practical reasons: we want to ensure that the results we display are accurate and reproducible, but 1) commercial closed models can change their API thus rendering any scoring at a given time incorrect 2) we re-run everything on our cluster to ensure all models are run on the same setup and you can't do that for these models