File size: 4,772 Bytes
6ea10f1
 
d9cb6ff
 
 
 
 
 
 
 
6ea10f1
 
 
 
4b4b06e
 
 
 
 
 
 
 
6ea10f1
97c212e
6ea10f1
 
 
97c212e
 
 
6924cd6
22f0bc0
 
 
ff5cc58
270cdb2
ff5cc58
270cdb2
5d3faf0
 
 
 
 
 
 
 
6924cd6
d29e544
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6924cd6
d9cb6ff
6ea10f1
d9cb6ff
6ea10f1
d9cb6ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ea10f1
d9cb6ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ea10f1
d9cb6ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ea10f1
d9cb6ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ea10f1
d9cb6ff
6ea10f1
 
 
d9cb6ff
6ea10f1
 
 
d9cb6ff
6ea10f1
 
 
d9cb6ff
6ea10f1
 
 
d9cb6ff
6ea10f1
 
 
d9cb6ff
6ea10f1
 
 
 
d9cb6ff
6ea10f1
 
 
 
4b4b06e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
base_model:
- meta-llama/Meta-Llama-3-8B
- meta-llama/Meta-Llama-3-8B-Instruct
- rinna/llama-3-youko-8b
- rinna/llama-3-youko-8b-instruct
- tokyotech-llm/Llama-3-Swallow-8B-v0.1
- tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
- shisa-ai/shisa-v1-llama3-8b
- lmg-anon/vntl-llama3-8b-v2-qlora
library_name: transformers
tags:
- mergekit
- merge
- translation
- japanese_media
- otaku_media
- visual_novels
- VNs
language:
- en
- ja
---
# Llama-3-VNTL-Yollisa-8B

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

This merge is an expansion on the idea of [merging at extremely low weight as an alternitive to finetuning](https://huggingface.co/grimjim/kukulemon-v3-soul_mix-32k-7B) with the added step of subtracting the base model from finetunes before merging.
Instruct format is the custom version of llama3 that VNTL uses, but you should be able to mix in some regular llama3 formats as well, and it might even help with improving translation quality with the right prompt.

## Usage
### Presets
For SillyTavern use [these presets.](https://huggingface.co/Casual-Autopsy/Llama-3-VNTL-Yollisa-8B/tree/main/ST)

When adding prompts outside of Metadata, set role to system and add instruct format manually.
Because system prompt formats are blank, this allows to write ST scripts to add old chat pairs to the Data Bank with instruct formats RegExed in and inject them via RAG. I found that doing so increases translation quality greatly.

The Data Bank entry should look something like this with instruct format included:
```
<|start_header_id|>Japanese<|end_header_id|>

イヴ「ええ、導力を失い、か弱くなってしまったわたくしの1人や2人くらい守ってくださらないとね」<|eot_id|><|start_header_id|>English<|end_header_id|>

Yves: "Yes, I’ve lost my douryoku and become weaker. You’ll have to protect someone like me, won’t you?"<|eot_id|>
```

### Samplers
```yaml
top_k: 1
min_p: 0.15
rep_pen: 1.01
pres_pen: -0.05
rep_pen_range: 512

dyna_temp:
  min: 0.7
  max: 1.3
  exp: 1.0

sampler_order:
  - min_p
  - temp
  - penalties
  - top_k
```

## Configuration

The following YAML configuration was used to produce this model:

### Llama-3-Yollow-8B
```yaml
models:
  # Pivot model
  - model: meta-llama/Meta-Llama-3-8B
  # Target models
  - model: rinna/llama-3-youko-8b
  - model: tokyotech-llm/Llama-3-Swallow-8B-v0.1
merge_method: sce
base_model: meta-llama/Meta-Llama-3-8B
parameters:
  select_topk: 1.0
dtype: float32
```

### Llama-3-Minus-Base-8B
```yaml
models:
  # Finetune model
  - model: meta-llama/Meta-Llama-3-8B-Instruct
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
  normalize: false
dtype: float32
```

### Llama-3-Youko-Minus-Base-8B
```yaml
models:
  # Finetune model
  - model: rinna/llama-3-youko-8b-instruct
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: rinna/llama-3-youko-8b-instruct
parameters:
  normalize: false
dtype: float32
```

### Llama-3-Swallow-Minus-Base-8B
```yaml
models:
  # Finetune model
  - model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
parameters:
  normalize: false
dtype: float32
```

### Llama-3-Shisa-Minus-Base-8B
```yaml
models:
  # Finetune model
  - model: shisa-ai/shisa-v1-llama3-8b
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: shisa-ai/shisa-v1-llama3-8b
parameters:
  normalize: false
dtype: float32
```

### Llama-3-VNTL-Yollisa-8B
```yaml
models:
  # Base
  - model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
    parameters:
      weight: 1.0
  # Models
  - model: Casual-Autopsy/Llama-3-Minus-Base-8B
    parameters:
      density: 0.35
      weight: 10e-5
  - model: Casual-Autopsy/Llama-3-Shisa-Minus-Base-8B
    parameters:
      density: 0.85
      weight: 25e-5
  - model: Casual-Autopsy/Llama-3-Swallow-Minus-Base-8B
    parameters:
      density: 0.85
      weight: 25e-5
  - model: Casual-Autopsy/Llama-3-Youko-Minus-Base-8B
    parameters:
      density: 0.85
      weight: 25e-5
merge_method: ties
base_model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
parameters:
  normalize: false
  int8_mask: false
dtype: float32
```