crumbly/cramp-25m · Hugging Face

A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000. (On the graphic it's mis-labeled as cramp-41m)

OLD BENCHMARK

model	avg	arc	hellaswag	mmlu	truthfulqa
cramp-25m	30.57	21.76	27.35	25.53	47.66
gpt2 (125m)	30.06	22.1	31.6	25.86	40.67
pythia 70m deduped	30.25	21.08	27.17	25.26	47.51
pythia 70m	30.46	21.59	27.29	25.9	47.06
pythia 160m deduped	31.16	24.06	30.34	24.95	44.34
pythia 160m	30.58	22.78	30.34	24.95	44.26

*NEW BENCHMARK

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
arc_challenge	1	none	25	acc	0.1724	±	0.0110
		none	25	acc_norm	0.2031	±	0.0118
truthfulqa_mc2	2	none	0	acc	0.4767	±	0.0156
hellaswag	1	none	10	acc	0.2687	±	0.0044
		none	10	acc_norm	0.2773	±	0.0045
winogrande	1	none	5	acc	0.5028	±	0.0141

MMLU

Tasks	Filter	n-shot	Metric	Value		Stderr
world_religions	none	5	acc	0.1813	±	0.0295
virology	none	5	acc	0.1928	±	0.0307
us_foreign_policy	none	5	acc	0.2900	±	0.0456
sociology	none	5	acc	0.2438	±	0.0304
security_studies	none	5	acc	0.2367	±	0.0272
public_relations	none	5	acc	0.2455	±	0.0412
professional_psychology	none	5	acc	0.2271	±	0.0169
professional_medicine	none	5	acc	0.4375	±	0.0301
professional_law	none	5	acc	0.2490	±	0.0110
professional_accounting	none	5	acc	0.2589	±	0.0261
prehistory	none	5	acc	0.2963	±	0.0254
philosophy	none	5	acc	0.2315	±	0.0240
nutrition	none	5	acc	0.2222	±	0.0238
moral_scenarios	none	5	acc	0.2313	±	0.0141
moral_disputes	none	5	acc	0.2168	±	0.0222
miscellaneous	none	5	acc	0.2708	±	0.0159
medical_genetics	none	5	acc	0.3000	±	0.0461
marketing	none	5	acc	0.1923	±	0.0258
management	none	5	acc	0.1942	±	0.0392
machine_learning	none	5	acc	0.2054	±	0.0383
logical_fallacies	none	5	acc	0.2393	±	0.0335
jurisprudence	none	5	acc	0.2130	±	0.0396
international_law	none	5	acc	0.2562	±	0.0398
human_sexuality	none	5	acc	0.2366	±	0.0373
human_aging	none	5	acc	0.2063	±	0.0272
high_school_world_history	none	5	acc	0.2700	±	0.0289
high_school_us_history	none	5	acc	0.2206	±	0.0291
high_school_statistics	none	5	acc	0.4722	±	0.0340
high_school_psychology	none	5	acc	0.2257	±	0.0179
high_school_physics	none	5	acc	0.2384	±	0.0348
high_school_microeconomics	none	5	acc	0.3403	±	0.0308
high_school_mathematics	none	5	acc	0.2630	±	0.0268
high_school_macroeconomics	none	5	acc	0.2051	±	0.0205
high_school_government_and_politics	none	5	acc	0.2280	±	0.0303
high_school_geography	none	5	acc	0.3535	±	0.0341
high_school_european_history	none	5	acc	0.2909	±	0.0355
high_school_computer_science	none	5	acc	0.2400	±	0.0429
high_school_chemistry	none	5	acc	0.2759	±	0.0314
high_school_biology	none	5	acc	0.3161	±	0.0265
global_facts	none	5	acc	0.2000	±	0.0402
formal_logic	none	5	acc	0.1825	±	0.0346
elementary_mathematics	none	5	acc	0.2566	±	0.0225
electrical_engineering	none	5	acc	0.2414	±	0.0357
econometrics	none	5	acc	0.2544	±	0.0410
conceptual_physics	none	5	acc	0.2809	±	0.0294
computer_security	none	5	acc	0.2000	±	0.0402
college_physics	none	5	acc	0.3431	±	0.0472
college_medicine	none	5	acc	0.2197	±	0.0316
college_mathematics	none	5	acc	0.3100	±	0.0465
college_computer_science	none	5	acc	0.3100	±	0.0465
college_chemistry	none	5	acc	0.3400	±	0.0476
college_biology	none	5	acc	0.2083	±	0.0340
clinical_knowledge	none	5	acc	0.2189	±	0.0254
business_ethics	none	5	acc	0.2000	±	0.0402
astronomy	none	5	acc	0.2237	±	0.0339
anatomy	none	5	acc	0.3333	±	0.0407
abstract_algebra	none	5	acc	0.2200	±	0.0416

crumbly
/

cramp-25m

Datasets used to train crumbly/cramp-25m