File size: 3,137 Bytes
7b4f3b1 0f4897a 7b4f3b1 0f4897a 7c70a55 2c46008 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: cc-by-nc-4.0
tags:
- not-for-all-audiences
- nsfw
---
First :
```shell
layer_slices:
- model: Undi95/MLewd-L2-Chat-13B
start: 0
end: 16
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 8
end: 20
- model: Undi95/MLewd-L2-Chat-13B
start: 17
end: 32
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 21
end: 40
```
Inverted:
```shell
layer_slices:
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 0
end: 16
- model: Undi95/MLewd-L2-Chat-13B
start: 8
end: 20
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 17
end: 32
- model: Undi95/MLewd-L2-Chat-13B
start: 21
end: 40
```
Precise:
```shell
layer_slices:
- model: Undi95/MLewd-L2-Chat-13B
start: 0
end: 8
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 4
end: 12
- model: Undi95/MLewd-L2-Chat-13B
start: 9
end: 16
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 13
end: 22
- model: Undi95/MLewd-L2-Chat-13B
start: 17
end: 24
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 23
end: 32
- model: Undi95/MLewd-L2-Chat-13B
start: 25
end: 32
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 33
end: 40
```
PreciseInverted:
```shell
layer_slices:
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 0
end: 8
- model: Undi95/MLewd-L2-Chat-13B
start: 4
end: 12
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 9
end: 16
- model: Undi95/MLewd-L2-Chat-13B
start: 13
end: 22
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 17
end: 24
- model: Undi95/MLewd-L2-Chat-13B
start: 23
end: 32
- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
start: 25
end: 32
- model: Undi95/MLewd-L2-Chat-13B
start: 33
end: 40
```
Part1 = ReMM v2.1 merged /w MLewd low weight to keep consistency. I call this "dilution" and result show consistency and coherency without repeat/loop beside the small amount of duplicated datas.
The goal is to find the best way to interlace layers the best way possible to have a sweetspot between 13B and +30B.
Normal/Inverted is by chunk of 16 layers and Precise/PreciseInverted is by chunk of 8 layers.
All the models are made of 64(+1) layers. Need testing.
## Prompt template: Alpaca
```
Below is an instruction that describes a task. Write a response that completes the request.
### Instruction:
{prompt}
### Response:
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Undi95__MLewd-ReMM-L2-Chat-20B-Inverted)
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | 50.81 |
| ARC (25-shot) | 61.69 |
| HellaSwag (10-shot) | 85.32 |
| MMLU (5-shot) | 58.0 |
| TruthfulQA (0-shot) | 53.77 |
| Winogrande (5-shot) | 75.61 |
| GSM8K (5-shot) | 9.1 |
| DROP (3-shot) | 12.16 |
|