Spaces:
Sleeping
Sleeping
[LMSYS Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) is an | |
LLM evaluation platform. This Space presents an alternative method of | |
ranking based on the [Bradley–Terry | |
model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model) | |
(BT). This Space takes a Bayesian approach to BT parameter estimation, | |
unlike the MLE approach used by the LMSYS organization. | |
This Space is divided into two primary sections: the first presents a | |
ranking of models based on estimated ability. The figure on the right | |
visualizes this ranking for the top 10 models, while the table below | |
presents the full set. The second section estimates the probability | |
that one model will be preferred to another. | |