File size: 6,621 Bytes
9d1e327 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
base_model:
- CultriX/Qwen2.5-14B-Brocav7
- CultriX/Qwen2.5-14B-Emerged
- sometimesanotion/Lamarck-14B-v0.6
- djuna/Q2.5-Veltha-14B-0.5
- allknowingroger/QwenSlerp6-14B
- CultriX/SeQwence-14B-EvolMerge
- hotmailuser/QwenSlerp2-14B
- CultriX/Qwen2.5-14B-Hyperionv3
- CultriX/Qwen2.5-14B-Wernickev3
- qingy2024/Fusion4-14B-Instruct
library_name: transformers
tags:
- mergekit
- merge
---
# merge
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
## Merge Details
### Merge Method
This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [CultriX/Qwen2.5-14B-Wernickev3](https://huggingface.co/CultriX/Qwen2.5-14B-Wernickev3) as a base.
### Models Merged
The following models were included in the merge:
* [CultriX/Qwen2.5-14B-Brocav7](https://huggingface.co/CultriX/Qwen2.5-14B-Brocav7)
* [CultriX/Qwen2.5-14B-Emerged](https://huggingface.co/CultriX/Qwen2.5-14B-Emerged)
* [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6)
* [djuna/Q2.5-Veltha-14B-0.5](https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5)
* [allknowingroger/QwenSlerp6-14B](https://huggingface.co/allknowingroger/QwenSlerp6-14B)
* [CultriX/SeQwence-14B-EvolMerge](https://huggingface.co/CultriX/SeQwence-14B-EvolMerge)
* [hotmailuser/QwenSlerp2-14B](https://huggingface.co/hotmailuser/QwenSlerp2-14B)
* [CultriX/Qwen2.5-14B-Hyperionv3](https://huggingface.co/CultriX/Qwen2.5-14B-Hyperionv3)
* [qingy2024/Fusion4-14B-Instruct](https://huggingface.co/qingy2024/Fusion4-14B-Instruct)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
merge_method: dare_ties # Merge method for dynamic, task-aware parameter blending.
base_model: CultriX/Qwen2.5-14B-Wernickev3 # Main backbone for parameter alignment.
dtype: bfloat16 # Efficient precision for memory usage.
out_dtype: bfloat16 # Output data type to maintain consistency and efficiency.
parameters:
epsilon: 0.010 # Fine-tuned scaling for precise parameter adjustments.
lambda: 2.0 # Emphasizes high-impact parameters for improved task performance.
normalize: true # Ensures parameter normalization for stability during merging.
rescale: true # Rescales parameters across models for better integration.
int8_mask: false # Disables int8 masking to preserve full precision.
adaptive_merge_parameters:
task_weights: # Weight prioritization for tasks.
tinyArc: 1.6 # Balanced focus on logical reasoning.
tinyHellaswag: 1.5 # Moderate priority for contextual reasoning.
tinyMMLU: 1.8 # High priority for multi-domain knowledge tasks.
tinyTruthfulQA: 2.2 # High emphasis on factual QA accuracy.
tinyTruthfulQA_mc1: 1.8 # High priority for multiple-choice factual QA.
tinyWinogrande: 1.75 # Moderate priority for contextual reasoning tasks.
IFEval: 2.5 # Maximum priority for instruction-following tasks.
BBH: 2.2 # High priority for complex reasoning tasks.
MATH: 2.8 # Maximum priority for mathematical reasoning.
GPQA: 2.2 # Balanced focus on graduate-level QA tasks.
MUSR: 2.2 # High priority for multi-step reasoning.
MMLU-PRO: 2.0 # High priority for multitask, domain-specific knowledge.
smoothing_factor: 0.03 # Precise blending of task-specific contributions.
gradient_clipping: # Gradient clipping for stability during merging.
CultriX/Qwen2.5-14B-Wernickev3: 0.89 # Stability for the base model.
djuna/Q2.5-Veltha-14B-0.5: 0.91 # Stability for reasoning contributions.
CultriX/SeQwence-14B-EvolMerge: 0.87 # Stabilized for multitask performance.
qingy2024/Fusion4-14B-Instruct: 0.93 # High stability for mathematical reasoning.
CultriX/Qwen2.5-14B-Emerged: 0.89 # Stability for multitask contributions.
sometimesanotion/Lamarck-14B-v0.6: 0.89 # Stability for multi-step reasoning.
allknowingroger/QwenSlerp6-14B: 0.90 # Stability for general reasoning and multitask tasks.
hotmailuser/QwenSlerp2-14B: 0.91 # Stabilized for instruction following.
CultriX/Qwen2.5-14B-Hyperionv3: 0.90 # Stability for this model's general performance.
CultriX/Qwen2.5-14B-Brocav7: 0.90 # Stability for specific task contributions.
models: # Definition of models and their weights/densities.
- model: CultriX/Qwen2.5-14B-Wernickev3 # Base generalist model.
parameters:
weight: 0.28 # Balanced weight for a strong backbone.
density: 0.78 # Slightly reduced to balance smaller contributors.
- model: djuna/Q2.5-Veltha-14B-0.5 # Reasoning-focused model.
parameters:
weight: 0.27 # Slightly reduced for better balance.
density: 0.77 # Balanced density to ensure nuanced reasoning contributions.
- model: allknowingroger/QwenSlerp6-14B # Strong multitask performer.
parameters:
weight: 0.15 # Balanced weight for generalist capabilities.
density: 0.76 # Balanced density to maintain stable contributions.
- model: hotmailuser/QwenSlerp2-14B # High IFEval performer.
parameters:
weight: 0.12 # Maintains stable contributions for instruction-following tasks.
density: 0.70 # Increased density to enhance integration.
- model: CultriX/Qwen2.5-14B-Hyperionv3 # Generalist model with solid performance.
parameters:
weight: 0.10 # Increased for balanced general contributions.
density: 0.75 # Balanced density for stable integration.
- model: CultriX/Qwen2.5-14B-Brocav7 # Model for specific tasks like reasoning.
parameters:
weight: 0.10 # Increased weight to strengthen specific contributions.
density: 0.76 # Increased density for better parameter preservation.
- model: CultriX/SeQwence-14B-EvolMerge # Multitask generalist.
parameters:
weight: 0.08 # Balanced weight for broader coverage.
density: 0.68 # Slight increase for better integration.
- model: qingy2024/Fusion4-14B-Instruct # Specialist in mathematical reasoning.
parameters:
weight: 0.08 # Balanced weight for MATH tasks.
density: 0.78 # Increased density to enhance task-specific integration.
- model: CultriX/Qwen2.5-14B-Emerged # General multitask model.
parameters:
weight: 0.08 # Balanced for multitask contributions.
density: 0.72 # Increased density for better parameter alignment.
- model: sometimesanotion/Lamarck-14B-v0.6 # Multi-step reasoning focus.
parameters:
weight: 0.05 # Slightly increased to improve its contributions.
density: 0.65 # Increased for better parameter blending.
```
|