steampunque commited on
Commit
f953c6c
·
1 Parent(s): b3ee3e9

clarify zero shot/few shot use

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ category and discipline summaries.
14
  Tests are run using a modified llama.cpp server (supporting logprob completion mode) and/or textsynth server where noted.
15
 
16
  METHODOLOGY:
17
- All CoT and code tests are zero shot.
18
  Math CoT test such as GSM8K, APPLE, MATH etc. are self graded against correct answer using LLM under test
19
  If self grade does not work reliably (such as with very small model) the result is zeroed to mark invalid test.
20
  All MC tests do two queries, 1 with answers in test order and 2nd with answers circularly shifted 1.
 
14
  Tests are run using a modified llama.cpp server (supporting logprob completion mode) and/or textsynth server where noted.
15
 
16
  METHODOLOGY:
17
+ All CoT, code, and math tests are zero shot. A few BBH tests use fewshot examples.
18
  Math CoT test such as GSM8K, APPLE, MATH etc. are self graded against correct answer using LLM under test
19
  If self grade does not work reliably (such as with very small model) the result is zeroed to mark invalid test.
20
  All MC tests do two queries, 1 with answers in test order and 2nd with answers circularly shifted 1.