A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000. (On the graphic it's mis-labeled as cramp-41m)
OLD BENCHMARK
model | avg | arc | hellaswag | mmlu | truthfulqa |
---|---|---|---|---|---|
cramp-25m | 30.57 | 21.76 | 27.35 | 25.53 | 47.66 |
gpt2 (125m) | 30.06 | 22.1 | 31.6 | 25.86 | 40.67 |
pythia 70m deduped | 30.25 | 21.08 | 27.17 | 25.26 | 47.51 |
pythia 70m | 30.46 | 21.59 | 27.29 | 25.9 | 47.06 |
pythia 160m deduped | 31.16 | 24.06 | 30.34 | 24.95 | 44.34 |
pythia 160m | 30.58 | 22.78 | 30.34 | 24.95 | 44.26 |
*NEW BENCHMARK
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 25 | acc | 0.1724 | ± | 0.0110 |
none | 25 | acc_norm | 0.2031 | ± | 0.0118 | ||
truthfulqa_mc2 | 2 | none | 0 | acc | 0.4767 | ± | 0.0156 |
hellaswag | 1 | none | 10 | acc | 0.2687 | ± | 0.0044 |
none | 10 | acc_norm | 0.2773 | ± | 0.0045 | ||
winogrande | 1 | none | 5 | acc | 0.5028 | ± | 0.0141 |
MMLU
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
world_religions | 0 | none | 5 | acc | 0.1813 | ± | 0.0295 |
virology | 0 | none | 5 | acc | 0.1928 | ± | 0.0307 |
us_foreign_policy | 0 | none | 5 | acc | 0.2900 | ± | 0.0456 |
sociology | 0 | none | 5 | acc | 0.2438 | ± | 0.0304 |
security_studies | 0 | none | 5 | acc | 0.2367 | ± | 0.0272 |
public_relations | 0 | none | 5 | acc | 0.2455 | ± | 0.0412 |
professional_psychology | 0 | none | 5 | acc | 0.2271 | ± | 0.0169 |
professional_medicine | 0 | none | 5 | acc | 0.4375 | ± | 0.0301 |
professional_law | 0 | none | 5 | acc | 0.2490 | ± | 0.0110 |
professional_accounting | 0 | none | 5 | acc | 0.2589 | ± | 0.0261 |
prehistory | 0 | none | 5 | acc | 0.2963 | ± | 0.0254 |
philosophy | 0 | none | 5 | acc | 0.2315 | ± | 0.0240 |
nutrition | 0 | none | 5 | acc | 0.2222 | ± | 0.0238 |
moral_scenarios | 0 | none | 5 | acc | 0.2313 | ± | 0.0141 |
moral_disputes | 0 | none | 5 | acc | 0.2168 | ± | 0.0222 |
miscellaneous | 0 | none | 5 | acc | 0.2708 | ± | 0.0159 |
medical_genetics | 0 | none | 5 | acc | 0.3000 | ± | 0.0461 |
marketing | 0 | none | 5 | acc | 0.1923 | ± | 0.0258 |
management | 0 | none | 5 | acc | 0.1942 | ± | 0.0392 |
machine_learning | 0 | none | 5 | acc | 0.2054 | ± | 0.0383 |
logical_fallacies | 0 | none | 5 | acc | 0.2393 | ± | 0.0335 |
jurisprudence | 0 | none | 5 | acc | 0.2130 | ± | 0.0396 |
international_law | 0 | none | 5 | acc | 0.2562 | ± | 0.0398 |
human_sexuality | 0 | none | 5 | acc | 0.2366 | ± | 0.0373 |
human_aging | 0 | none | 5 | acc | 0.2063 | ± | 0.0272 |
high_school_world_history | 0 | none | 5 | acc | 0.2700 | ± | 0.0289 |
high_school_us_history | 0 | none | 5 | acc | 0.2206 | ± | 0.0291 |
high_school_statistics | 0 | none | 5 | acc | 0.4722 | ± | 0.0340 |
high_school_psychology | 0 | none | 5 | acc | 0.2257 | ± | 0.0179 |
high_school_physics | 0 | none | 5 | acc | 0.2384 | ± | 0.0348 |
high_school_microeconomics | 0 | none | 5 | acc | 0.3403 | ± | 0.0308 |
high_school_mathematics | 0 | none | 5 | acc | 0.2630 | ± | 0.0268 |
high_school_macroeconomics | 0 | none | 5 | acc | 0.2051 | ± | 0.0205 |
high_school_government_and_politics | 0 | none | 5 | acc | 0.2280 | ± | 0.0303 |
high_school_geography | 0 | none | 5 | acc | 0.3535 | ± | 0.0341 |
high_school_european_history | 0 | none | 5 | acc | 0.2909 | ± | 0.0355 |
high_school_computer_science | 0 | none | 5 | acc | 0.2400 | ± | 0.0429 |
high_school_chemistry | 0 | none | 5 | acc | 0.2759 | ± | 0.0314 |
high_school_biology | 0 | none | 5 | acc | 0.3161 | ± | 0.0265 |
global_facts | 0 | none | 5 | acc | 0.2000 | ± | 0.0402 |
formal_logic | 0 | none | 5 | acc | 0.1825 | ± | 0.0346 |
elementary_mathematics | 0 | none | 5 | acc | 0.2566 | ± | 0.0225 |
electrical_engineering | 0 | none | 5 | acc | 0.2414 | ± | 0.0357 |
econometrics | 0 | none | 5 | acc | 0.2544 | ± | 0.0410 |
conceptual_physics | 0 | none | 5 | acc | 0.2809 | ± | 0.0294 |
computer_security | 0 | none | 5 | acc | 0.2000 | ± | 0.0402 |
college_physics | 0 | none | 5 | acc | 0.3431 | ± | 0.0472 |
college_medicine | 0 | none | 5 | acc | 0.2197 | ± | 0.0316 |
college_mathematics | 0 | none | 5 | acc | 0.3100 | ± | 0.0465 |
college_computer_science | 0 | none | 5 | acc | 0.3100 | ± | 0.0465 |
college_chemistry | 0 | none | 5 | acc | 0.3400 | ± | 0.0476 |
college_biology | 0 | none | 5 | acc | 0.2083 | ± | 0.0340 |
clinical_knowledge | 0 | none | 5 | acc | 0.2189 | ± | 0.0254 |
business_ethics | 0 | none | 5 | acc | 0.2000 | ± | 0.0402 |
astronomy | 0 | none | 5 | acc | 0.2237 | ± | 0.0339 |
anatomy | 0 | none | 5 | acc | 0.3333 | ± | 0.0407 |
abstract_algebra | 0 | none | 5 | acc | 0.2200 | ± | 0.0416 |
- Downloads last month
- 38
Inference API (serverless) does not yet support model repos that contain custom code.