200m-ish parameter model (I think the param count in the graphic here is wrong, but the bench values are correct) with the token embedding and language modelling head of Llama2-70b attached, with linear transformations from Llama2-70b's 8192d space down to this model's 1024d space.
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | Yaml | none | 25 | acc | 0.1775 | ± | 0.0112 |
none | 25 | acc_norm | 0.2133 | ± | 0.0120 | ||
truthfulqa_mc2 | Yaml | none | 0 | acc | 0.4457 | ± | 0.0152 |
winogrande | Yaml | none | 5 | acc | 0.5154 | ± | 0.014 |
hellaswag | Yaml | none | 10 | acc | 0.2832 | ± | 0.0045 |
none | 10 | acc_norm | 0.3024 | ± | 0.0046 |
MMLU
(avg accuracy: 26.17%)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
abstract_algebra | Yaml | none | 5 | acc | 0.2200 | ± | 0.0416 |
anatomy | Yaml | none | 5 | acc | 0.2222 | ± | 0.0359 |
astronomy | Yaml | none | 5 | acc | 0.1776 | ± | 0.0311 |
business_ethics | Yaml | none | 5 | acc | 0.2300 | ± | 0.0423 |
clinical_knowledge | Yaml | none | 5 | acc | 0.2415 | ± | 0.0263 |
college_biology | Yaml | none | 5 | acc | 0.3194 | ± | 0.0390 |
college_chemistry | Yaml | none | 5 | acc | 0.2000 | ± | 0.0402 |
college_computer_science | Yaml | none | 5 | acc | 0.2800 | ± | 0.0451 |
college_mathematics | Yaml | none | 5 | acc | 0.2800 | ± | 0.0451 |
college_medicine | Yaml | none | 5 | acc | 0.2254 | ± | 0.0319 |
college_physics | Yaml | none | 5 | acc | 0.2157 | ± | 0.0409 |
computer_security | Yaml | none | 5 | acc | 0.2200 | ± | 0.0416 |
conceptual_physics | Yaml | none | 5 | acc | 0.2553 | ± | 0.0285 |
econometrics | Yaml | none | 5 | acc | 0.2368 | ± | 0.0400 |
electrical_engineering | Yaml | none | 5 | acc | 0.2345 | ± | 0.0353 |
elementary_mathematics | Yaml | none | 5 | acc | 0.2646 | ± | 0.0227 |
formal_logic | Yaml | none | 5 | acc | 0.2302 | ± | 0.0376 |
global_facts | Yaml | none | 5 | acc | 0.1700 | ± | 0.0378 |
high_school_biology | Yaml | none | 5 | acc | 0.2903 | ± | 0.0258 |
high_school_chemistry | Yaml | none | 5 | acc | 0.2611 | ± | 0.0309 |
high_school_computer_science | Yaml | none | 5 | acc | 0.2300 | ± | 0.0423 |
high_school_european_history | Yaml | none | 5 | acc | 0.2788 | ± | 0.0350 |
high_school_geography | Yaml | none | 5 | acc | 0.3081 | ± | 0.0329 |
high_school_government_and_politics | Yaml | none | 5 | acc | 0.3731 | ± | 0.0349 |
high_school_macroeconomics | Yaml | none | 5 | acc | 0.2923 | ± | 0.0231 |
high_school_mathematics | Yaml | none | 5 | acc | 0.2630 | ± | 0.0268 |
high_school_microeconomics | Yaml | none | 5 | acc | 0.3403 | ± | 0.0308 |
high_school_physics | Yaml | none | 5 | acc | 0.2715 | ± | 0.0363 |
high_school_psychology | Yaml | none | 5 | acc | 0.2881 | ± | 0.0194 |
high_school_statistics | Yaml | none | 5 | acc | 0.4722 | ± | 0.0340 |
high_school_us_history | Yaml | none | 5 | acc | 0.3529 | ± | 0.0335 |
high_school_world_history | Yaml | none | 5 | acc | 0.2532 | ± | 0.0283 |
human_aging | Yaml | none | 5 | acc | 0.2108 | ± | 0.0274 |
human_sexuality | Yaml | none | 5 | acc | 0.2672 | ± | 0.0388 |
international_law | Yaml | none | 5 | acc | 0.2479 | ± | 0.0394 |
jurisprudence | Yaml | none | 5 | acc | 0.2500 | ± | 0.0419 |
logical_fallacies | Yaml | none | 5 | acc | 0.2393 | ± | 0.0335 |
machine_learning | Yaml | none | 5 | acc | 0.2946 | ± | 0.0433 |
management | Yaml | none | 5 | acc | 0.1650 | ± | 0.0368 |
marketing | Yaml | none | 5 | acc | 0.1923 | ± | 0.0258 |
medical_genetics | Yaml | none | 5 | acc | 0.3000 | ± | 0.0461 |
miscellaneous | Yaml | none | 5 | acc | 0.2720 | ± | 0.0159 |
moral_disputes | Yaml | none | 5 | acc | 0.1936 | ± | 0.0213 |
moral_scenarios | Yaml | none | 5 | acc | 0.2380 | ± | 0.0142 |
nutrition | Yaml | none | 5 | acc | 0.2484 | ± | 0.0247 |
philosophy | Yaml | none | 5 | acc | 0.2283 | ± | 0.0238 |
prehistory | Yaml | none | 5 | acc | 0.2346 | ± | 0.0236 |
professional_accounting | Yaml | none | 5 | acc | 0.2589 | ± | 0.0261 |
professional_law | Yaml | none | 5 | acc | 0.2445 | ± | 0.0110 |
professional_medicine | Yaml | none | 5 | acc | 0.4485 | ± | 0.0302 |
professional_psychology | Yaml | none | 5 | acc | 0.2614 | ± | 0.0178 |
public_relations | Yaml | none | 5 | acc | 0.2364 | ± | 0.0407 |
security_studies | Yaml | none | 5 | acc | 0.4000 | ± | 0.0314 |
sociology | Yaml | none | 5 | acc | 0.3035 | ± | 0.0325 |
us_foreign_policy | Yaml | none | 5 | acc | 0.2800 | ± | 0.0451 |
virology | Yaml | none | 5 | acc | 0.2048 | ± | 0.0314 |
world_religions | Yaml | none | 5 | acc | 0.1988 | ± | 0.0306 |
- Downloads last month
- 21
Inference API (serverless) does not yet support model repos that contain custom code.