sthenno/tempesthenno-nuslerp-001 · Nuslerp parameters?

12 days ago

This nuslerp is fascinating, and new to me. I like that you adopted my slices-oriented YAML, which I got initial inspiration for in @CultriX 's EvolMerge models. I've been developing it further; it helps target layers for specific gradients, and helps run merges with less CUDA memory.

I also see you are combining breadcrumbs with another merge strategy! Neat!

I'm curious to know if gradients like

        parameters:
          weight: [ 0.35, 30 ]
          nuslerp_flatten: false
          nuslerp_row_wise: true

...will give you more reliable results. I'm used to regular slerp with its t parameter!

sthenno

Owner 12 days ago

This nuslerp is fascinating, and new to me. I like that you adopted my slices-oriented YAML, which I got initial inspiration for in @CultriX 's EvolMerge models. I've been developing it further; it helps target layers for specific gradients, and helps run merges with less CUDA memory.

I also see you are combining breadcrumbs with another merge strategy! Neat!

I'm curious to know if gradients like
        parameters:
          weight: [ 0.35, 30 ]
          nuslerp_flatten: false
          nuslerp_row_wise: true
...will give you more reliable results. I'm used to regular slerp with its t parameter!

There's a lot that can be said about this mathematically. I will later give you a "roughly reasonable" explanation based on mathematical properties. This explanation will be long, but please give me some time to make it sufficiently reasonable.

sometimesanotion

8 days ago

•

edited 8 days ago

Congratulations! You have really got IFEVAL and BBH on point for high-averaging 14B models!