Leaderboards 🔥 - a sugatoray Collection

sugatoray 's Collections

Books And Notes

Reasoning Datasets

SmolAgents Tools (Spaces)

Bookmark::Models

LLMs

AV LLMs

LLM Training Datasets

Papers

Leaderboards 🔥

Papers-Fundamentals

TFM: TimeSeries Foundation Models

Papers-Benchmarks

LLMs-EmbeddingModels

LLM + Datasets : Finance

Leaderboards 🔥

updated 3 days ago

A collection of Leaderboards for LLMs ⚡️⚖️ 🤗

Running

4.12k

4.12k

Chatbot Arena Leaderboard

🏆

Display chatbot competition leaderboard
Running on CPU Upgrade

12.6k

12.6k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running

187

187

Yet Another LLM Leaderboard

🌖

Run a Streamlit web app
Running on CPU Upgrade

132

132

Hallucinations Leaderboard

🔥

View and submit LLM evaluations
Running

431

431

LLM-Perf Leaderboard

🏆

Explore LLM performance across hardware
Running on CPU Upgrade

89

89

LLM Safety Leaderboard

🥇

View and submit machine learning model evaluations
Running

222

222

AI2 WildBench Leaderboard (V2)

🦁

Display and explore model leaderboards and chat history
Runtime error

30

30

Contextual Leaderboard

🐨
Running on CPU Upgrade

4.95k

4.95k

MTEB Leaderboard

🥇

Select benchmarks and languages for text embeddings evaluation
Running on CPU Upgrade

50

50

Open CoT Leaderboard

🥇

Track, rank and evaluate open LLMs' CoT quality
Running

288

288

LLM Performance Leaderboard

🐨

View LLM Performance Leaderboard
Running

183

183

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

59

59

The timm Leaderboard

🏆

Display and analyze PyTorch Image Models leaderboard
Running

65

65

Open FinLLM Leaderboard

🥇

Browse and submit large language model evaluations
Running

101

101

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Running

42

42

MEGA-Bench Leaderboard

🥇

A leaderboard for multimodal models
Running on CPU Upgrade

88

88

Open LLM Leaderboard Model Comparator

🏆

Compare Open LLM Leaderboard results
Running

116

116

Vidore Leaderboard

🥇

Display Visual Document Retrieval leaderboard
Running

92

92

Judge Arena

💻

Compare AI models by voting on responses
Running on CPU Upgrade

644

644

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection
Running on TPU v5e

9

9

Keras Chatbot Battle

💬

Interact with multiple chatbots simultaneously
Sleeping

4

4

OmniEval

🥇
Running

5

5

OmniEval

🥇

Official Leaderboard for OmniEval
open-llm-leaderboard/contents

Viewer • Updated 34 minutes ago • 4.29k • 16.4k • 15
Running on CPU Upgrade

62

62

LeaderboardExplorer

🔎

Filter and display leaderboards based on selected criteria
Running on CPU Upgrade

287

287

GAIA Leaderboard

🦾

Submit and evaluate text-based models
m-ric/agents_small_benchmark

Viewer • Updated Jan 19, 2024 • 100 • 137 • 10
Running on Zero

310

310

TTS Spaces Arena

🤗

Blind vote on HF TTS models!
Running

98

98

MTEB Arena

⚔

Teach, test, evaluate language models with MTEB Arena
Running on Zero

269

269

GenAI Arena

📈

Realtime Image/Video Gen AI Arena
Running on CPU Upgrade

223

223

Agent Leaderboard

💬

Ranking of LLMs for agentic tasks
Running on CPU Upgrade

635

635

Open ASR Leaderboard

🏆

Request evaluation for speech models
Running

31

31

Open LMM Reasoning Leaderboard

🥇

A Leaderboard that demonstrates LMM reasoning capabilities