Spaces:
Running
Running
steampunque
commited on
Commit
·
f953c6c
1
Parent(s):
b3ee3e9
clarify zero shot/few shot use
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ category and discipline summaries.
|
|
14 |
Tests are run using a modified llama.cpp server (supporting logprob completion mode) and/or textsynth server where noted.
|
15 |
|
16 |
METHODOLOGY:
|
17 |
-
All CoT and
|
18 |
Math CoT test such as GSM8K, APPLE, MATH etc. are self graded against correct answer using LLM under test
|
19 |
If self grade does not work reliably (such as with very small model) the result is zeroed to mark invalid test.
|
20 |
All MC tests do two queries, 1 with answers in test order and 2nd with answers circularly shifted 1.
|
|
|
14 |
Tests are run using a modified llama.cpp server (supporting logprob completion mode) and/or textsynth server where noted.
|
15 |
|
16 |
METHODOLOGY:
|
17 |
+
All CoT, code, and math tests are zero shot. A few BBH tests use fewshot examples.
|
18 |
Math CoT test such as GSM8K, APPLE, MATH etc. are self graded against correct answer using LLM under test
|
19 |
If self grade does not work reliably (such as with very small model) the result is zeroed to mark invalid test.
|
20 |
All MC tests do two queries, 1 with answers in test order and 2nd with answers circularly shifted 1.
|