crumb/shrink-v1 · Hugging Face

200m-ish parameter model (I think the param count in the graphic here is wrong, but the bench values are correct) with the token embedding and language modelling head of Llama2-70b attached, with linear transformations from Llama2-70b's 8192d space down to this model's 1024d space.

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
arc_challenge	Yaml	none	25	acc	0.1775	±	0.0112
		none	25	acc_norm	0.2133	±	0.0120
truthfulqa_mc2	Yaml	none	0	acc	0.4457	±	0.0152
winogrande	Yaml	none	5	acc	0.5154	±	0.014
hellaswag	Yaml	none	10	acc	0.2832	±	0.0045
		none	10	acc_norm	0.3024	±	0.0046

MMLU

(avg accuracy: 26.17%)

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
abstract_algebra	Yaml	none	5	acc	0.2200	±	0.0416
anatomy	Yaml	none	5	acc	0.2222	±	0.0359
astronomy	Yaml	none	5	acc	0.1776	±	0.0311
business_ethics	Yaml	none	5	acc	0.2300	±	0.0423
clinical_knowledge	Yaml	none	5	acc	0.2415	±	0.0263
college_biology	Yaml	none	5	acc	0.3194	±	0.0390
college_chemistry	Yaml	none	5	acc	0.2000	±	0.0402
college_computer_science	Yaml	none	5	acc	0.2800	±	0.0451
college_mathematics	Yaml	none	5	acc	0.2800	±	0.0451
college_medicine	Yaml	none	5	acc	0.2254	±	0.0319
college_physics	Yaml	none	5	acc	0.2157	±	0.0409
computer_security	Yaml	none	5	acc	0.2200	±	0.0416
conceptual_physics	Yaml	none	5	acc	0.2553	±	0.0285
econometrics	Yaml	none	5	acc	0.2368	±	0.0400
electrical_engineering	Yaml	none	5	acc	0.2345	±	0.0353
elementary_mathematics	Yaml	none	5	acc	0.2646	±	0.0227
formal_logic	Yaml	none	5	acc	0.2302	±	0.0376
global_facts	Yaml	none	5	acc	0.1700	±	0.0378
high_school_biology	Yaml	none	5	acc	0.2903	±	0.0258
high_school_chemistry	Yaml	none	5	acc	0.2611	±	0.0309
high_school_computer_science	Yaml	none	5	acc	0.2300	±	0.0423
high_school_european_history	Yaml	none	5	acc	0.2788	±	0.0350
high_school_geography	Yaml	none	5	acc	0.3081	±	0.0329
high_school_government_and_politics	Yaml	none	5	acc	0.3731	±	0.0349
high_school_macroeconomics	Yaml	none	5	acc	0.2923	±	0.0231
high_school_mathematics	Yaml	none	5	acc	0.2630	±	0.0268
high_school_microeconomics	Yaml	none	5	acc	0.3403	±	0.0308
high_school_physics	Yaml	none	5	acc	0.2715	±	0.0363
high_school_psychology	Yaml	none	5	acc	0.2881	±	0.0194
high_school_statistics	Yaml	none	5	acc	0.4722	±	0.0340
high_school_us_history	Yaml	none	5	acc	0.3529	±	0.0335
high_school_world_history	Yaml	none	5	acc	0.2532	±	0.0283
human_aging	Yaml	none	5	acc	0.2108	±	0.0274
human_sexuality	Yaml	none	5	acc	0.2672	±	0.0388
international_law	Yaml	none	5	acc	0.2479	±	0.0394
jurisprudence	Yaml	none	5	acc	0.2500	±	0.0419
logical_fallacies	Yaml	none	5	acc	0.2393	±	0.0335
machine_learning	Yaml	none	5	acc	0.2946	±	0.0433
management	Yaml	none	5	acc	0.1650	±	0.0368
marketing	Yaml	none	5	acc	0.1923	±	0.0258
medical_genetics	Yaml	none	5	acc	0.3000	±	0.0461
miscellaneous	Yaml	none	5	acc	0.2720	±	0.0159
moral_disputes	Yaml	none	5	acc	0.1936	±	0.0213
moral_scenarios	Yaml	none	5	acc	0.2380	±	0.0142
nutrition	Yaml	none	5	acc	0.2484	±	0.0247
philosophy	Yaml	none	5	acc	0.2283	±	0.0238
prehistory	Yaml	none	5	acc	0.2346	±	0.0236
professional_accounting	Yaml	none	5	acc	0.2589	±	0.0261
professional_law	Yaml	none	5	acc	0.2445	±	0.0110
professional_medicine	Yaml	none	5	acc	0.4485	±	0.0302
professional_psychology	Yaml	none	5	acc	0.2614	±	0.0178
public_relations	Yaml	none	5	acc	0.2364	±	0.0407
security_studies	Yaml	none	5	acc	0.4000	±	0.0314
sociology	Yaml	none	5	acc	0.3035	±	0.0325
us_foreign_policy	Yaml	none	5	acc	0.2800	±	0.0451
virology	Yaml	none	5	acc	0.2048	±	0.0314
world_religions	Yaml	none	5	acc	0.1988	±	0.0306

crumb
/

shrink-v1

MMLU

Dataset used to train crumb/shrink-v1