merge
I haven't tried the untuned MS3 before messing around with the merge. But I don't think it's all that different from this thing. It's not like there's no influence from the tuned adapters at all, it's just less than I expected. That might be for the better, though. The result is usable as is.
Will use this as part of upcoming merges when there is enough fuel.
Merge Details
Step1
models:
- model: unsloth/Mistral-Small-24B-Base-2501
- model: unsloth/Mistral-Small-24B-Instruct-2501+ToastyPigeon/new-ms-rp-test-ws
parameters:
select_topk:
- value: [0.05, 0.03, 0.02, 0.02, 0.01]
- model: unsloth/Mistral-Small-24B-Instruct-2501+estrogen/MS2501-24b-Ink-ep2-adpt
parameters:
select_topk: 0.1
- model: trashpanda-org/MS-24B-Instruct-Mullein-v0
parameters:
select_topk: 0.4
base_model: unsloth/Mistral-Small-24B-Base-2501
merge_method: sce
parameters:
int8_mask: true
rescale: true
normalize: true
dtype: bfloat16
tokenizer_source: base
Step2
dtype: bfloat16
tokenizer_source: base
merge_method: della_linear
parameters:
density: 0.55
base_model: Step1
models:
- model: unsloth/Mistral-Small-24B-Instruct-2501
parameters:
weight:
- filter: v_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: o_proj
value: [1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1]
- filter: up_proj
value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
- filter: gate_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: down_proj
value: [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
- value: 0
- model: Step1
parameters:
weight:
- filter: v_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: o_proj
value: [0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0]
- filter: up_proj
value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- filter: gate_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: down_proj
value: [0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
- value: 1
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for Nohobby/MS3-test-Merge-1
Base model
mistralai/Mistral-Small-24B-Base-2501