Update README.md
Browse files
README.md
CHANGED
@@ -95,6 +95,6 @@ The model achieves the following results without any fine-tuning (zero-shot):
|
|
95 |
|arc_challenge|acc/acc_norm|0.1903/0.2270 |0.1997/0.2329 |0.4132/0.6256 |
|
96 |
|
97 |
To get these results, we used the Eleuther AI evaluation harness [here](https://github.com/EleutherAI/lm-evaluation-harness).\
|
98 |
-
We chose these 20 tasks, because they are the tasks that the GPT2 and GPT3 papers report results for
|
99 |
The harness can produce results a little different than those reported in the GPT2 paper.\
|
100 |
The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.
|
|
|
95 |
|arc_challenge|acc/acc_norm|0.1903/0.2270 |0.1997/0.2329 |0.4132/0.6256 |
|
96 |
|
97 |
To get these results, we used the Eleuther AI evaluation harness [here](https://github.com/EleutherAI/lm-evaluation-harness).\
|
98 |
+
We chose these 20 tasks, because they are the tasks that the GPT2 and GPT3 papers report results for.\
|
99 |
The harness can produce results a little different than those reported in the GPT2 paper.\
|
100 |
The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.
|