weqweasdas commited on
Commit
b29ef86
·
verified ·
1 Parent(s): 0e3c5fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -14,6 +14,8 @@ The model is trained from [meta-llama/Meta-Llama-3-8B](https://huggingface.co/me
14
 
15
  ## Academic Benchmarks
16
 
 
 
17
  | **Model** | **Size** | **Method** | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** |
18
  |----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
19
  | LLaMA-3-8B-it | 8B | RS+DPO+PPO |22.9|8.16| 79.6 | 26.3 | 66.0 | 61.6 | 43.9 | 59.5 |
 
14
 
15
  ## Academic Benchmarks
16
 
17
+ We use ToRA script to evaluate GSM8K and MATH, Evalplut for HumanEval, and lm-evaluation-harness for other benchmarks. The model is evaluated in zero-shot setting.
18
+
19
  | **Model** | **Size** | **Method** | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** |
20
  |----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
21
  | LLaMA-3-8B-it | 8B | RS+DPO+PPO |22.9|8.16| 79.6 | 26.3 | 66.0 | 61.6 | 43.9 | 59.5 |