dfurman
/

Llama-2-7B-Instruct-v0.1

Text Generation

PEFT

Safetensors

llama-2

Eval Results

Model card Files Files and versions Community

dfurman commited on Nov 18, 2023

Commit

e136682

1 Parent(s): 9d3bf68

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -30

README.md CHANGED Viewed

@@ -21,26 +21,22 @@ base_model: meta-llama/Llama-2-7b-hf
 This instruction model was built via parameter-efficient QLoRA finetuning of [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the first 5k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) and the first 5k rows of [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 2 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
-## Benchmark metrics
-| Metric                | Value |
-|-----------------------|-------|
-| MMLU (5-shot)         | 46.63 |
-| ARC (25-shot)         | 51.19 |
-| HellaSwag (10-shot)   | 78.92 |
-| TruthfulQA (0-shot)   | 48.5 |
-| Avg.                  | 56.31 |
 We use the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
-## Helpful links
-* Model license: coming
-* Basic usage: coming
-* Finetuning code: coming
-* Loss curves: coming
-* Runtime stats: coming
 ## Loss curve
 ![loss curve](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/sep_12_23_9_20_00_log_loss_curves_Llama-2-7b-instruct.png)
@@ -161,17 +157,3 @@ The following `bitsandbytes` quantization config was used during training:
 ## Framework versions
 - PEFT 0.6.0.dev0
-# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
-Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-7b-instruct-peft)
-| Metric                | Value                     |
-|-----------------------|---------------------------|
-| Avg.                  | 44.5   |
-| ARC (25-shot)         | 51.19          |
-| HellaSwag (10-shot)   | 78.92    |
-| MMLU (5-shot)         | 46.63         |
-| TruthfulQA (0-shot)   | 48.5   |
-| Winogrande (5-shot)   | 74.43   |
-| GSM8K (5-shot)        | 5.99        |
-| DROP (3-shot)         | 5.82         |

 This instruction model was built via parameter-efficient QLoRA finetuning of [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the first 5k rows of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) and the first 5k rows of [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus). Finetuning was executed on 1x A100 (40 GB SXM) for roughly 2 hours on the [Lambda Labs](https://cloud.lambdalabs.com/instances) platform.
+# Open LLM Leaderboard Evaluation Results
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__llama-2-7b-instruct-peft)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 44.5   |
+| ARC (25-shot)         | 51.19          |
+| HellaSwag (10-shot)   | 78.92    |
+| MMLU (5-shot)         | 46.63         |
+| TruthfulQA (0-shot)   | 48.5   |
+| Winogrande (5-shot)   | 74.43   |
+| GSM8K (5-shot)        | 5.99        |
+| DROP (3-shot)         | 5.82         |
 We use the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 ## Loss curve
 ![loss curve](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/sep_12_23_9_20_00_log_loss_curves_Llama-2-7b-instruct.png)
 ## Framework versions
 - PEFT 0.6.0.dev0