leaderboard-pr-bot's picture
Adding Evaluation Results
00856ad
|
raw
history blame
1.87 kB
metadata
license: other

This is the merged hf tr version of llama 30B and OA's rlhf 30B xor weights:

https://huggingface.co/OpenAssistant/oasst-rlhf-2-llama-30b-7k-steps-xor

This the md5 checksum that I get locally, which matchs the original repo suggests

deb33dd4ffc3d2baddcce275a00b7c1b  ./tokenizer.json
ed59bfee4e87b9193fea5897d610ab24  ./tokenizer_config.json
704373f0c0d62be75e5f7d41d39a7e57  ./special_tokens_map.json
4c5941b4ee12dc0d8e6b5ca3f6819f4d  ./pytorch_model-00004-of-00007.bin
13a3641423840eb89f9a86507a90b2bf  ./pytorch_model.bin.index.json
d08594778f00abe70b93899628e41246  ./pytorch_model-00007-of-00007.bin
9a4d2468ecf85bf07420b200faefb4af  ./config.json
2c92d306969c427275f34b4ebf66f087  ./pytorch_model-00006-of-00007.bin
148bfd184af630a7633b4de2f41bfc49  ./generation_config.json
b6e90377103e9270cbe46b13aed288ec  ./pytorch_model-00005-of-00007.bin
27b0dc092f99aa2efaf467b2d8026c3f  ./added_tokens.json
ed991042b2a449123824f689bb94b29e  ./pytorch_model-00002-of-00007.bin
f11acc069334434d68c45a80ee899fe5  ./pytorch_model-00003-of-00007.bin
9f41bd4d5720d28567b3e7820b4a8023  ./pytorch_model-00001-of-00007.bin
eeec4125e9c7560836b4873b6f8e3025  ./tokenizer.model

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 53.18
ARC (25-shot) 61.35
HellaSwag (10-shot) 83.8
MMLU (5-shot) 57.89
TruthfulQA (0-shot) 51.18
Winogrande (5-shot) 78.77
GSM8K (5-shot) 31.46
DROP (3-shot) 7.78