File size: 6,621 Bytes
9d1e327
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
base_model:
- CultriX/Qwen2.5-14B-Brocav7
- CultriX/Qwen2.5-14B-Emerged
- sometimesanotion/Lamarck-14B-v0.6
- djuna/Q2.5-Veltha-14B-0.5
- allknowingroger/QwenSlerp6-14B
- CultriX/SeQwence-14B-EvolMerge
- hotmailuser/QwenSlerp2-14B
- CultriX/Qwen2.5-14B-Hyperionv3
- CultriX/Qwen2.5-14B-Wernickev3
- qingy2024/Fusion4-14B-Instruct
library_name: transformers
tags:
- mergekit
- merge

---
# merge

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details
### Merge Method

This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [CultriX/Qwen2.5-14B-Wernickev3](https://huggingface.co/CultriX/Qwen2.5-14B-Wernickev3) as a base.

### Models Merged

The following models were included in the merge:
* [CultriX/Qwen2.5-14B-Brocav7](https://huggingface.co/CultriX/Qwen2.5-14B-Brocav7)
* [CultriX/Qwen2.5-14B-Emerged](https://huggingface.co/CultriX/Qwen2.5-14B-Emerged)
* [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6)
* [djuna/Q2.5-Veltha-14B-0.5](https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5)
* [allknowingroger/QwenSlerp6-14B](https://huggingface.co/allknowingroger/QwenSlerp6-14B)
* [CultriX/SeQwence-14B-EvolMerge](https://huggingface.co/CultriX/SeQwence-14B-EvolMerge)
* [hotmailuser/QwenSlerp2-14B](https://huggingface.co/hotmailuser/QwenSlerp2-14B)
* [CultriX/Qwen2.5-14B-Hyperionv3](https://huggingface.co/CultriX/Qwen2.5-14B-Hyperionv3)
* [qingy2024/Fusion4-14B-Instruct](https://huggingface.co/qingy2024/Fusion4-14B-Instruct)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
merge_method: dare_ties  # Merge method for dynamic, task-aware parameter blending.
base_model: CultriX/Qwen2.5-14B-Wernickev3  # Main backbone for parameter alignment.
dtype: bfloat16  # Efficient precision for memory usage.
out_dtype: bfloat16  # Output data type to maintain consistency and efficiency.

parameters:
  epsilon: 0.010   # Fine-tuned scaling for precise parameter adjustments.
  lambda: 2.0   # Emphasizes high-impact parameters for improved task performance.
  normalize: true  # Ensures parameter normalization for stability during merging.
  rescale: true  # Rescales parameters across models for better integration.
  int8_mask: false  # Disables int8 masking to preserve full precision.

adaptive_merge_parameters:
  task_weights:  # Weight prioritization for tasks.
    tinyArc: 1.6  # Balanced focus on logical reasoning.
    tinyHellaswag: 1.5  # Moderate priority for contextual reasoning.
    tinyMMLU: 1.8  # High priority for multi-domain knowledge tasks.
    tinyTruthfulQA: 2.2  # High emphasis on factual QA accuracy.
    tinyTruthfulQA_mc1: 1.8  # High priority for multiple-choice factual QA.
    tinyWinogrande: 1.75  # Moderate priority for contextual reasoning tasks.
    IFEval: 2.5  # Maximum priority for instruction-following tasks.
    BBH: 2.2  # High priority for complex reasoning tasks.
    MATH: 2.8  # Maximum priority for mathematical reasoning.
    GPQA: 2.2  # Balanced focus on graduate-level QA tasks.
    MUSR: 2.2  # High priority for multi-step reasoning.
    MMLU-PRO: 2.0  # High priority for multitask, domain-specific knowledge.
  smoothing_factor: 0.03  # Precise blending of task-specific contributions.

gradient_clipping:  # Gradient clipping for stability during merging.
  CultriX/Qwen2.5-14B-Wernickev3: 0.89  # Stability for the base model.
  djuna/Q2.5-Veltha-14B-0.5: 0.91  # Stability for reasoning contributions.
  CultriX/SeQwence-14B-EvolMerge: 0.87  # Stabilized for multitask performance.
  qingy2024/Fusion4-14B-Instruct: 0.93  # High stability for mathematical reasoning.
  CultriX/Qwen2.5-14B-Emerged: 0.89  # Stability for multitask contributions.
  sometimesanotion/Lamarck-14B-v0.6: 0.89  # Stability for multi-step reasoning.
  allknowingroger/QwenSlerp6-14B: 0.90  # Stability for general reasoning and multitask tasks.
  hotmailuser/QwenSlerp2-14B: 0.91  # Stabilized for instruction following.
  CultriX/Qwen2.5-14B-Hyperionv3: 0.90  # Stability for this model's general performance.
  CultriX/Qwen2.5-14B-Brocav7: 0.90  # Stability for specific task contributions.

models:  # Definition of models and their weights/densities.
  - model: CultriX/Qwen2.5-14B-Wernickev3  # Base generalist model.
    parameters:
      weight: 0.28  # Balanced weight for a strong backbone.
      density: 0.78  # Slightly reduced to balance smaller contributors.

  - model: djuna/Q2.5-Veltha-14B-0.5  # Reasoning-focused model.
    parameters:
      weight: 0.27  # Slightly reduced for better balance.
      density: 0.77  # Balanced density to ensure nuanced reasoning contributions.

  - model: allknowingroger/QwenSlerp6-14B  # Strong multitask performer.
    parameters:
      weight: 0.15  # Balanced weight for generalist capabilities.
      density: 0.76  # Balanced density to maintain stable contributions.

  - model: hotmailuser/QwenSlerp2-14B  # High IFEval performer.
    parameters:
      weight: 0.12  # Maintains stable contributions for instruction-following tasks.
      density: 0.70  # Increased density to enhance integration.

  - model: CultriX/Qwen2.5-14B-Hyperionv3  # Generalist model with solid performance.
    parameters:
      weight: 0.10  # Increased for balanced general contributions.
      density: 0.75  # Balanced density for stable integration.

  - model: CultriX/Qwen2.5-14B-Brocav7  # Model for specific tasks like reasoning.
    parameters:
      weight: 0.10  # Increased weight to strengthen specific contributions.
      density: 0.76  # Increased density for better parameter preservation.

  - model: CultriX/SeQwence-14B-EvolMerge  # Multitask generalist.
    parameters:
      weight: 0.08  # Balanced weight for broader coverage.
      density: 0.68  # Slight increase for better integration.

  - model: qingy2024/Fusion4-14B-Instruct  # Specialist in mathematical reasoning.
    parameters:
      weight: 0.08  # Balanced weight for MATH tasks.
      density: 0.78  # Increased density to enhance task-specific integration.

  - model: CultriX/Qwen2.5-14B-Emerged  # General multitask model.
    parameters:
      weight: 0.08  # Balanced for multitask contributions.
      density: 0.72  # Increased density for better parameter alignment.

  - model: sometimesanotion/Lamarck-14B-v0.6  # Multi-step reasoning focus.
    parameters:
      weight: 0.05  # Slightly increased to improve its contributions.
      density: 0.65  # Increased for better parameter blending.
```