WildEval

non-profit

wild_eval

WildEval

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yuchenlin authored a paper 5 days ago

Small Models Struggle to Learn from Strong Reasoners

DongfuJiang authored a paper 20 days ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

yuchenlin authored a paper 21 days ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

View all activity

spaces 1

Zebra Logic Bench

Explore and compare Zebra Puzzle solving models

models

None public yet

datasets 9

WildEval/ZebraLogic

Viewer • Updated 21 days ago • 4.26k • 227 • 5

WildEval/G-PlanET

Viewer • Updated Aug 1, 2024 • 1.42k • 50

WildEval/ZeroEval

Viewer • Updated Jul 23, 2024 • 4.61k • 411

WildEval/WildBench-V2

Viewer • Updated May 22, 2024 • 2.05k • 77

WildEval/WildBench-Results-v2-internal

Viewer • Updated May 21, 2024 • 30k • 224

WildEval/WildBench-Results-V2

Viewer • Updated May 20, 2024 • 10.2k • 143

WildEval/WildBench-v2-dev

Viewer • Updated Apr 19, 2024 • 5.99k • 27

WildEval/WildBench-dev

Updated Apr 19, 2024 • 13 • 1

WildEval/NaturalChats

Viewer • Updated Apr 18, 2024 • 641k • 30