Undi95
/

MLewd-ReMM-L2-Chat-20B-Inverted

Text Generation

Not-For-All-Audiences

nsfw

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

MLewd-ReMM-L2-Chat-20B-Inverted / README.md

Undi95's picture

Adding Evaluation Results (#2)

2c46008 10 months ago

|

history blame contribute delete

No virus

3.14 kB

	---
	license: cc-by-nc-4.0
	tags:
	- not-for-all-audiences
	- nsfw
	---

	First :
	```shell
	layer_slices:
	- model: Undi95/MLewd-L2-Chat-13B
	start: 0
	end: 16
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 8
	end: 20
	- model: Undi95/MLewd-L2-Chat-13B
	start: 17
	end: 32
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 21
	end: 40
	```

	Inverted:
	```shell
	layer_slices:
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 0
	end: 16
	- model: Undi95/MLewd-L2-Chat-13B
	start: 8
	end: 20
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 17
	end: 32
	- model: Undi95/MLewd-L2-Chat-13B
	start: 21
	end: 40
	```

	Precise:
	```shell
	layer_slices:
	- model: Undi95/MLewd-L2-Chat-13B
	start: 0
	end: 8
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 4
	end: 12
	- model: Undi95/MLewd-L2-Chat-13B
	start: 9
	end: 16
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 13
	end: 22
	- model: Undi95/MLewd-L2-Chat-13B
	start: 17
	end: 24
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 23
	end: 32
	- model: Undi95/MLewd-L2-Chat-13B
	start: 25
	end: 32
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 33
	end: 40
	```

	PreciseInverted:
	```shell
	layer_slices:
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 0
	end: 8
	- model: Undi95/MLewd-L2-Chat-13B
	start: 4
	end: 12
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 9
	end: 16
	- model: Undi95/MLewd-L2-Chat-13B
	start: 13
	end: 22
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 17
	end: 24
	- model: Undi95/MLewd-L2-Chat-13B
	start: 23
	end: 32
	- model: Undi95/MLewd-ReMM-L2-Chat-20B-Part1
	start: 25
	end: 32
	- model: Undi95/MLewd-L2-Chat-13B
	start: 33
	end: 40
	```

	Part1 = ReMM v2.1 merged /w MLewd low weight to keep consistency. I call this "dilution" and result show consistency and coherency without repeat/loop beside the small amount of duplicated datas.

	The goal is to find the best way to interlace layers the best way possible to have a sweetspot between 13B and +30B.

	Normal/Inverted is by chunk of 16 layers and Precise/PreciseInverted is by chunk of 8 layers.

	All the models are made of 64(+1) layers. Need testing.

	## Prompt template: Alpaca

	```
	Below is an instruction that describes a task. Write a response that completes the request.

	### Instruction:
	{prompt}

	### Response:
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Undi95__MLewd-ReMM-L2-Chat-20B-Inverted)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 50.81 \|
	\| ARC (25-shot) \| 61.69 \|
	\| HellaSwag (10-shot) \| 85.32 \|
	\| MMLU (5-shot) \| 58.0 \|
	\| TruthfulQA (0-shot) \| 53.77 \|
	\| Winogrande (5-shot) \| 75.61 \|
	\| GSM8K (5-shot) \| 9.1 \|
	\| DROP (3-shot) \| 12.16 \|