File size: 4,772 Bytes
6ea10f1 d9cb6ff 6ea10f1 4b4b06e 6ea10f1 97c212e 6ea10f1 97c212e 6924cd6 22f0bc0 ff5cc58 270cdb2 ff5cc58 270cdb2 5d3faf0 6924cd6 d29e544 6924cd6 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 d9cb6ff 6ea10f1 4b4b06e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
---
base_model:
- meta-llama/Meta-Llama-3-8B
- meta-llama/Meta-Llama-3-8B-Instruct
- rinna/llama-3-youko-8b
- rinna/llama-3-youko-8b-instruct
- tokyotech-llm/Llama-3-Swallow-8B-v0.1
- tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
- shisa-ai/shisa-v1-llama3-8b
- lmg-anon/vntl-llama3-8b-v2-qlora
library_name: transformers
tags:
- mergekit
- merge
- translation
- japanese_media
- otaku_media
- visual_novels
- VNs
language:
- en
- ja
---
# Llama-3-VNTL-Yollisa-8B
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
This merge is an expansion on the idea of [merging at extremely low weight as an alternitive to finetuning](https://huggingface.co/grimjim/kukulemon-v3-soul_mix-32k-7B) with the added step of subtracting the base model from finetunes before merging.
Instruct format is the custom version of llama3 that VNTL uses, but you should be able to mix in some regular llama3 formats as well, and it might even help with improving translation quality with the right prompt.
## Usage
### Presets
For SillyTavern use [these presets.](https://huggingface.co/Casual-Autopsy/Llama-3-VNTL-Yollisa-8B/tree/main/ST)
When adding prompts outside of Metadata, set role to system and add instruct format manually.
Because system prompt formats are blank, this allows to write ST scripts to add old chat pairs to the Data Bank with instruct formats RegExed in and inject them via RAG. I found that doing so increases translation quality greatly.
The Data Bank entry should look something like this with instruct format included:
```
<|start_header_id|>Japanese<|end_header_id|>
イヴ「ええ、導力を失い、か弱くなってしまったわたくしの1人や2人くらい守ってくださらないとね」<|eot_id|><|start_header_id|>English<|end_header_id|>
Yves: "Yes, I’ve lost my douryoku and become weaker. You’ll have to protect someone like me, won’t you?"<|eot_id|>
```
### Samplers
```yaml
top_k: 1
min_p: 0.15
rep_pen: 1.01
pres_pen: -0.05
rep_pen_range: 512
dyna_temp:
min: 0.7
max: 1.3
exp: 1.0
sampler_order:
- min_p
- temp
- penalties
- top_k
```
## Configuration
The following YAML configuration was used to produce this model:
### Llama-3-Yollow-8B
```yaml
models:
# Pivot model
- model: meta-llama/Meta-Llama-3-8B
# Target models
- model: rinna/llama-3-youko-8b
- model: tokyotech-llm/Llama-3-Swallow-8B-v0.1
merge_method: sce
base_model: meta-llama/Meta-Llama-3-8B
parameters:
select_topk: 1.0
dtype: float32
```
### Llama-3-Minus-Base-8B
```yaml
models:
# Finetune model
- model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
normalize: false
dtype: float32
```
### Llama-3-Youko-Minus-Base-8B
```yaml
models:
# Finetune model
- model: rinna/llama-3-youko-8b-instruct
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: rinna/llama-3-youko-8b-instruct
parameters:
normalize: false
dtype: float32
```
### Llama-3-Swallow-Minus-Base-8B
```yaml
models:
# Finetune model
- model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
parameters:
normalize: false
dtype: float32
```
### Llama-3-Shisa-Minus-Base-8B
```yaml
models:
# Finetune model
- model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: shisa-ai/shisa-v1-llama3-8b
parameters:
normalize: false
dtype: float32
```
### Llama-3-VNTL-Yollisa-8B
```yaml
models:
# Base
- model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
parameters:
weight: 1.0
# Models
- model: Casual-Autopsy/Llama-3-Minus-Base-8B
parameters:
density: 0.35
weight: 10e-5
- model: Casual-Autopsy/Llama-3-Shisa-Minus-Base-8B
parameters:
density: 0.85
weight: 25e-5
- model: Casual-Autopsy/Llama-3-Swallow-Minus-Base-8B
parameters:
density: 0.85
weight: 25e-5
- model: Casual-Autopsy/Llama-3-Youko-Minus-Base-8B
parameters:
density: 0.85
weight: 25e-5
merge_method: ties
base_model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
parameters:
normalize: false
int8_mask: false
dtype: float32
``` |