dfurman commited on
Commit
e136682
·
1 Parent(s): 9d3bf68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -30
README.md CHANGED
@@ -21,26 +21,22 @@ base_model: meta-llama/Llama-2-7b-hf
21
 
22
  This instruction model was built via parameter-efficient QLoRA finetuning of [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the first 5k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) and the first 5k rows of [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 2 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
23
 
24
- ## Benchmark metrics
 
25
 
26
- | Metric | Value |
27
- |-----------------------|-------|
28
- | MMLU (5-shot) | 46.63 |
29
- | ARC (25-shot) | 51.19 |
30
- | HellaSwag (10-shot) | 78.92 |
31
- | TruthfulQA (0-shot) | 48.5 |
32
- | Avg. | 56.31 |
 
 
 
33
 
34
  We use the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
35
 
36
- ## Helpful links
37
-
38
- * Model license: coming
39
- * Basic usage: coming
40
- * Finetuning code: coming
41
- * Loss curves: coming
42
- * Runtime stats: coming
43
-
44
  ## Loss curve
45
 
46
  ![loss curve](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/sep_12_23_9_20_00_log_loss_curves_Llama-2-7b-instruct.png)
@@ -161,17 +157,3 @@ The following `bitsandbytes` quantization config was used during training:
161
  ## Framework versions
162
 
163
  - PEFT 0.6.0.dev0
164
-
165
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
166
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-7b-instruct-peft)
167
-
168
- | Metric | Value |
169
- |-----------------------|---------------------------|
170
- | Avg. | 44.5 |
171
- | ARC (25-shot) | 51.19 |
172
- | HellaSwag (10-shot) | 78.92 |
173
- | MMLU (5-shot) | 46.63 |
174
- | TruthfulQA (0-shot) | 48.5 |
175
- | Winogrande (5-shot) | 74.43 |
176
- | GSM8K (5-shot) | 5.99 |
177
- | DROP (3-shot) | 5.82 |
 
21
 
22
  This instruction model was built via parameter-efficient QLoRA finetuning of [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the first 5k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) and the first 5k rows of [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 2 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
23
 
24
+ # Open LLM Leaderboard Evaluation Results
25
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-7b-instruct-peft)
26
 
27
+ | Metric | Value |
28
+ |-----------------------|---------------------------|
29
+ | Avg. | 44.5 |
30
+ | ARC (25-shot) | 51.19 |
31
+ | HellaSwag (10-shot) | 78.92 |
32
+ | MMLU (5-shot) | 46.63 |
33
+ | TruthfulQA (0-shot) | 48.5 |
34
+ | Winogrande (5-shot) | 74.43 |
35
+ | GSM8K (5-shot) | 5.99 |
36
+ | DROP (3-shot) | 5.82 |
37
 
38
  We use the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
39
 
 
 
 
 
 
 
 
 
40
  ## Loss curve
41
 
42
  ![loss curve](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/sep_12_23_9_20_00_log_loss_curves_Llama-2-7b-instruct.png)
 
157
  ## Framework versions
158
 
159
  - PEFT 0.6.0.dev0