Undi95's picture
Update README.md
0f4897a
|
raw
history blame
2.3 kB
metadata
license: cc-by-nc-4.0
tags:
  - not-for-all-audiences
  - nsfw

First :

layer_slices:
  - model: Undi95/MLewd-L2-Chat-13B
    start: 0
    end: 16
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 8
    end: 20
  - model: Undi95/MLewd-L2-Chat-13B
    start: 17
    end: 32
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 21
    end: 40

Inverted:

layer_slices:
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 0
    end: 16
  - model: Undi95/MLewd-L2-Chat-13B
    start: 8
    end: 20
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 17
    end: 32
  - model: Undi95/MLewd-L2-Chat-13B
    start: 21
    end: 40

Precise:

layer_slices:
  - model: Undi95/MLewd-L2-Chat-13B
    start: 0
    end: 8
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 4
    end: 12
  - model: Undi95/MLewd-L2-Chat-13B
    start: 9
    end: 16
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 13
    end: 22
  - model: Undi95/MLewd-L2-Chat-13B
    start: 17
    end: 24
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 23
    end: 32
  - model: Undi95/MLewd-L2-Chat-13B
    start: 25
    end: 32
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 33
    end: 40

PreciseInverted:

layer_slices:
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 0
    end: 8
  - model: Undi95/MLewd-L2-Chat-13B
    start: 4
    end: 12
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 9
    end: 16
  - model: Undi95/MLewd-L2-Chat-13B
    start: 13
    end: 22
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 17
    end: 24
  - model: Undi95/MLewd-L2-Chat-13B
    start: 23
    end: 32
  - model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
    start: 25
    end: 32
  - model: Undi95/MLewd-L2-Chat-13B
    start: 33
    end: 40

Part1 = ReMM v2.1 merged /w MLewd low weight to keep consistency. I call this "dilution" and result show consistency and coherency without repeat/loop beside the small amount of duplicated datas.

The goal is to find the best way to interlace layers the best way possible to have a sweetspot between 13B and +30B.

Normal/Inverted is by chunk of 16 layers and Precise/PreciseInverted is by chunk of 8 layers.

All the models are made of 64(+1) layers. Need testing.