metadata
base_model:
- bamec66557/MNRP_0.5
- bamec66557/MISCHIEVOUS-12B
library_name: transformers
tags:
- mergekit
- merge
merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the SLERP merge method.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
slices:
- Sources:
- model: bamec66557/MNRP_0.5
layer_range: [0, 40] # Merge layer range for MNRP_0.5 model
- model: bamec66557/MISCHIEVOUS-12B
layer_range: [0, 40] # Merge layer range for MISCHIEVOUS-12B model.
# Adjust the merge ratio per layer to drive smoother integration
# Each filter affects a specific mechanism within the model
parameters:
t:
- Filter: self_attn
value: [0.2, 0.4, 0.6, 0.8, 1.0] # Progressive merging of self-attention layers
- filter: mlp
value: [0.8, 0.6, 0.4, 0.2, 0.0] # Merge MLP layers with opposite proportions
- filter: layer_norm
value: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5] # Layer Normalisation should be merged uniformly
- value: 0.7 # Default
merge_method: slerp # change merge method to slerp
base_model: bamec66557/MISCHIEVOUS-12B # base model for merge
dtype: bfloat16 # data type for efficient and fast operations when merging
# Additional available options
regularisation:
- method: l2_norm # Stabilise merged model weights with L2 normalisation
scale: 0.01
postprocessing:
- operation: smoothing # Smooth the weights after merging
kernel_size: 3
- operation: normalise # normalise the overall weights