FelixChao's picture
Update README.md
9510554 verified
metadata
license: apache-2.0
language:
  - en

WestSeverus - 7B - DPO - v2

image/png

☘️ Model Description

WestSeverus-7B-DPO-v2 is a WestLake Family model trained over WestSeverus-7B.

The model was trained on several dpo datasets and it can perform well on basic math problem.

WestSeverus-7B-DPO-v2 can be used in mathematics, chemical, physics and even coding for further research and reference.

πŸ“– Table of Contents

  1. Nous Benchmark Results

    • AGIEval
    • GPT4All
    • TruthfulQA Scores
    • BigBench
  2. Open LLM Leaderboard

    • ARC
    • HellaSwag
    • MMLU
    • TruthfulQA
    • Winogrande
    • GSM8K
  3. EvalPlus Leaderboard

    • HumanEval
    • HumanEval_Plus
    • MBPP
    • MBPP_Plus
  4. Prompt Format

  5. Quantized Models

  6. Gratitude

πŸͺ„ Nous Benchmark Results

WestSeverus-7B-DPO-v2 is currently on the top of the YALL - Yet Another LLM Leaderboard created by CultriX and it outperforms on TruthfulQA Scores and BigBench.

Model Average AGIEval GPT4All TruthfulQA Bigbench
WestSeverus-7B-DPO-v2 60.98 45.29 77.2 72.72 48.71
CultriX/Wernicke-7B-v1 60.73 45.59 77.36 71.46 48.49
mlabonne/NeuralBeagle14-7B 60.25 46.06 76.77 70.32 47.86
CultriX/MistralTrix-v1 60.05 44.98 76.62 71.44 47.17
senseable/WestLake-7B-v2 59.42 44.27 77.86 67.46 48.09
mlabonne/Daredevil-7B 58.22 44.85 76.07 64.89 47.07
microsoft/phi-2 44.61 27.96 70.84 44.46 35.17

πŸ† Open LLM Leaderboard

WestSeverus-7B-DPO-v2 is one of the top 7B model in Open LLM Leaderboard and it outperforms on TruthfulQA and GSM8K.

Metric Value
Avg. 75.29
AI2 Reasoning Challenge (25-Shot) 71.42
HellaSwag (10-Shot) 88.27
MMLU (5-Shot) 64.79
TruthfulQA (0-shot) 72.37
Winogrande (5-shot) 83.27
GSM8k (5-shot) 71.65

Detailed results can be found here

⚑ EvalPlus Leaderboard

Model HumanEval HumanEval_Plus MBPP MBPP_Plus
phi-2-2.7B 48.2 43.3 61.9 51.4
WestSeverus-7B-DPO-v2 43.3 34.1 TBD TBD
SOLAR-10.7B-Instruct-v1.0 42.1 34.3 42.9 34.6
CodeLlama-7B 37.8 34.1 57.6 45.4

image/png

βš—οΈ Prompt Format

WestSeverus-7B-DPO-v2 was trained using the ChatML prompt templates with system prompts. An example follows below:

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

πŸ› οΈ Quantized Models

Another version of WestSeverus Model:

MaziyarPanahi/WestSeverus-7B-DPO-v2-GGUF

πŸ™ Gratitude

  • Thanks to @senseable for senseable/WestLake-7B-v2.
  • Thanks to @jondurbin for jondurbin/truthy-dpo-v0.1 dataset.
  • Thanks to @Charles Goddard for MergeKit.
  • Thanks to @TheBloke, @s3nh, @MaziyarPanahi for Quantized Models.
  • Thanks to @mlabonne, @CultriX for YALL - Yet Another LLM Leaderboard.
  • Thank you to all the other people in the Open Source AI community who utilized this model for further research and improvement.