RLHFlow
/

LLaMA3-SFT-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Nov 3, 2024

Commit

b29ef86

·

verified ·

1 Parent(s): 0e3c5fc

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -14,6 +14,8 @@ The model is trained from [meta-llama/Meta-Llama-3-8B](https://huggingface.co/me
 ## Academic Benchmarks
 | **Model**                  | **Size** | **Method**      | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC**  |
 |----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
 | LLaMA-3-8B-it              | 8B       | RS+DPO+PPO      |22.9|8.16| 79.6 |   26.3    | 66.0     | 61.6          | 43.9           | 59.5    |

 ## Academic Benchmarks
+We use ToRA script to evaluate GSM8K and MATH, Evalplut for HumanEval, and lm-evaluation-harness for other benchmarks. The model is evaluated in zero-shot setting.
 | **Model**                  | **Size** | **Method**      | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MATH** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC**  |
 |----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
 | LLaMA-3-8B-it              | 8B       | RS+DPO+PPO      |22.9|8.16| 79.6 |   26.3    | 66.0     | 61.6          | 43.9           | 59.5    |