--- base_model: - hotmailuser/QwenSlerp2-14B - sometimesanotion/Lamarck-14B-v0.6 - CultriX/SeQwence-14B-EvolMerge - qingy2024/Fusion4-14B-Instruct - CultriX/Qwen2.5-14B-Hyperionv3 - CultriX/Qwen2.5-14B-Brocav7 - CultriX/Qwen2.5-14B-Brocav3 - djuna/Q2.5-Veltha-14B-0.5 library_name: transformers tags: - mergekit - merge --- # merge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [djuna/Q2.5-Veltha-14B-0.5](https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5) as a base. ### Models Merged The following models were included in the merge: * [hotmailuser/QwenSlerp2-14B](https://huggingface.co/hotmailuser/QwenSlerp2-14B) * [sometimesanotion/Lamarck-14B-v0.6](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6) * [CultriX/SeQwence-14B-EvolMerge](https://huggingface.co/CultriX/SeQwence-14B-EvolMerge) * [qingy2024/Fusion4-14B-Instruct](https://huggingface.co/qingy2024/Fusion4-14B-Instruct) * [CultriX/Qwen2.5-14B-Hyperionv3](https://huggingface.co/CultriX/Qwen2.5-14B-Hyperionv3) * [CultriX/Qwen2.5-14B-Brocav7](https://huggingface.co/CultriX/Qwen2.5-14B-Brocav7) * [CultriX/Qwen2.5-14B-Brocav3](https://huggingface.co/CultriX/Qwen2.5-14B-Brocav3) ### Configuration The following YAML configuration was used to produce this model: ```yaml merge_method: dare_ties base_model: djuna/Q2.5-Veltha-14B-0.5 output_dtype: bfloat16 data_type: bfloat16 parameters: epsilon: 0.0085 # Balanced between precision and flexibility. lambda: 2.3 # Adjusted to emphasize impactful parameters without overfitting. normalize: true # Ensures parameter normalization for stable integration. rescale: true # Dynamically rescales parameters for optimal alignment. int8_mask: true # Enables memory-efficient fine-tuning when applicable. adaptive_merge_parameters: task_weights: # Combined priority based on all configurations. tinyArc: 1.7 tinyHellaswag: 1.8 tinyMMLU: 2.0 tinyTruthfulQA: 2.8 tinyTruthfulQA_mc1: 2.4 tinyWinogrande: 2.1 IFEval: 3.2 BBH: 2.9 MATH: 3.4 GPQA: 2.6 MUSR: 2.7 MMLU-PRO: 2.5 smoothing_factor: 0.025 # Precise parameter blending. gradient_clipping: # Hybrid gradient clipping strategy. djuna/Q2.5-Veltha-14B-0.5: 0.89 CultriX/Qwen2.5-14B-Brocav3: 0.88 CultriX/Qwen2.5-14B-Hyperionv3: 0.87 CultriX/Qwen2.5-14B-Wernickev3: 0.88 hotmailuser/QwenSlerp2-14B: 0.90 allknowingroger/QwenSlerp6-14B: 0.86 sometimesanotion/Lamarck-14B-v0.6: 0.88 qingy2024/Fusion4-14B-Instruct: 0.91 CultriX/Qwen2.5-14B-Brocav7: 0.88 CultriX/SeQwence-14B-EvolMerge: 0.87 models: - model: djuna/Q2.5-Veltha-14B-0.5 parameters: weight: 0.28 # Backbone with strong reasoning capabilities. density: 0.78 - model: CultriX/Qwen2.5-14B-Brocav3 parameters: weight: 0.25 # High-performance reasoning and multitask contributions. density: 0.76 - model: CultriX/Qwen2.5-14B-Hyperionv3 parameters: weight: 0.18 # Balanced generalist for broad coverage. density: 0.75 - model: hotmailuser/QwenSlerp2-14B parameters: weight: 0.13 # Specialist in instruction-following and QA. density: 0.72 - model: sometimesanotion/Lamarck-14B-v0.6 parameters: weight: 0.10 # Multi-step reasoning and task-specific expert. density: 0.65 - model: qingy2024/Fusion4-14B-Instruct parameters: weight: 0.08 # Specialist in mathematical reasoning. density: 0.78 - model: CultriX/Qwen2.5-14B-Brocav7 parameters: weight: 0.08 # Focus on specific reasoning tasks. density: 0.77 - model: CultriX/SeQwence-14B-EvolMerge parameters: weight: 0.07 # Generalist for multitask integration. density: 0.68 ```