metadata
license: apache-2.0
tags:
- merge
model-index:
- name: Slerp-CM-mist-dpo
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 69.62
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Slerp-CM-mist-dpo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 87.09
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Slerp-CM-mist-dpo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 64.81
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Slerp-CM-mist-dpo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 62.82
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Slerp-CM-mist-dpo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 81.45
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Slerp-CM-mist-dpo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 72.78
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Slerp-CM-mist-dpo
name: Open LLM Leaderboard
This model is a Slerp Merge of cookinai/CatMacaroni-Slerp and mncai/mistral-7b-dpo-v5.
Evaluation Results
HuggingFace Leaderboard
Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
---|---|---|---|---|---|---|
73.1 | 69.62 | 87.09 | 64.81 | 62.82 | 81.45 | 72.78 |
The model did achieve an improvement in TruthfulQA over cookinai/CatMacaroni-Slerp
and GSM8K over mncai/mistral-7b-dpo-v5
which was the goal of the merge leading to an average score that was a better than both. It is unclear why the TruthfulQA metric
is still meaningfully lower than the base mncai/mistral-7b-dpo-v5
.
Training Details
.yaml file for mergekit
slices:
- sources:
- model: cookinai/CatMacaroni-Slerp
layer_range: [0, 32]
- model: mncai/mistral-7b-dpo-v5
layer_range: [0, 32]
merge_method: slerp
base_model: mncai/mistral-7b-dpo-v5
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5 # fallback for rest of tensors
dtype: float16
Bias, Risks, and Limitations
The model has not been evaluated for safety and is only intended for research and experiments.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 73.10 |
AI2 Reasoning Challenge (25-Shot) | 69.62 |
HellaSwag (10-Shot) | 87.09 |
MMLU (5-Shot) | 64.81 |
TruthfulQA (0-shot) | 62.82 |
Winogrande (5-shot) | 81.45 |
GSM8k (5-shot) | 72.78 |