Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
whoisjones
's Collections
General NER training datasets
MastermindEval
MastermindEval
updated
about 23 hours ago
Evaluating reasoning capabilities of LLMs using the game of Mastermind (paper is coming)
Upvote
-
whoisjones/mastermind_35_random
Viewer
•
Updated
Dec 18, 2024
•
37.1k
•
54
whoisjones/mastermind_46_random
Viewer
•
Updated
Dec 18, 2024
•
36.1k
•
52
whoisjones/mastermind_46_close
Viewer
•
Updated
Dec 18, 2024
•
36.1k
•
49
whoisjones/mastermind_24_random
Viewer
•
Updated
Dec 18, 2024
•
30.4k
•
46
whoisjones/mastermind_24_close
Viewer
•
Updated
Dec 18, 2024
•
30.4k
•
52
whoisjones/mastermind_35_close
Viewer
•
Updated
Dec 18, 2024
•
37.1k
•
47
whoisjones/mastermind_35
Viewer
•
Updated
Dec 5, 2024
•
37.1k
•
42
whoisjones/mastermind_46
Viewer
•
Updated
Dec 5, 2024
•
36.1k
•
41
whoisjones/mastermind_24
Viewer
•
Updated
Dec 5, 2024
•
30.4k
•
40
Upvote
-
Share collection
View history
Collection guide
Browse collections