Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,8 @@ The model is trained from [meta-llama/Meta-Llama-3-8B](https://huggingface.co/me
|
|
14 |
|
15 |
## Academic Benchmarks
|
16 |
|
|
|
|
|
17 |
| **Model** | **Size** | **Method** | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** |
|
18 |
|----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
|
19 |
| LLaMA-3-8B-it | 8B | RS+DPO+PPO |22.9|8.16| 79.6 | 26.3 | 66.0 | 61.6 | 43.9 | 59.5 |
|
|
|
14 |
|
15 |
## Academic Benchmarks
|
16 |
|
17 |
+
We use ToRA script to evaluate GSM8K and MATH, Evalplut for HumanEval, and lm-evaluation-harness for other benchmarks. The model is evaluated in zero-shot setting.
|
18 |
+
|
19 |
| **Model** | **Size** | **Method** | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** |
|
20 |
|----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
|
21 |
| LLaMA-3-8B-it | 8B | RS+DPO+PPO |22.9|8.16| 79.6 | 26.3 | 66.0 | 61.6 | 43.9 | 59.5 |
|