Nuslerp parameters?
This nuslerp is fascinating, and new to me. I like that you adopted my slices-oriented YAML, which I got initial inspiration for in @CultriX 's EvolMerge models. I've been developing it further; it helps target layers for specific gradients, and helps run merges with less CUDA memory.
I also see you are combining breadcrumbs with another merge strategy! Neat!
I'm curious to know if gradients like
parameters:
weight: [ 0.35, 30 ]
nuslerp_flatten: false
nuslerp_row_wise: true
...will give you more reliable results. I'm used to regular slerp with its t parameter!
This nuslerp is fascinating, and new to me. I like that you adopted my slices-oriented YAML, which I got initial inspiration for in @CultriX 's EvolMerge models. I've been developing it further; it helps target layers for specific gradients, and helps run merges with less CUDA memory.
I also see you are combining breadcrumbs with another merge strategy! Neat!
I'm curious to know if gradients like
parameters: weight: [ 0.35, 30 ] nuslerp_flatten: false nuslerp_row_wise: true
...will give you more reliable results. I'm used to regular slerp with its t parameter!
There's a lot that can be said about this mathematically. I will later give you a "roughly reasonable" explanation based on mathematical properties. This explanation will be long, but please give me some time to make it sufficiently reasonable.
Congratulations! You have really got IFEVAL and BBH on point for high-averaging 14B models!