blockblockblock commited on
Commit
cc8b00d
1 Parent(s): 38a10cc

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,620 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistral-community/Mixtral-8x22B-v0.1
4
+ tags:
5
+ - generated_from_trainer
6
+ - axolotl
7
+ model-index:
8
+ - name: out
9
+ results: []
10
+ datasets:
11
+ - cognitivecomputations/Dolphin-2.9
12
+ - teknium/OpenHermes-2.5
13
+ - m-a-p/CodeFeedback-Filtered-Instruction
14
+ - cognitivecomputations/dolphin-coder
15
+ - cognitivecomputations/samantha-data
16
+ - HuggingFaceH4/ultrachat_200k
17
+ - microsoft/orca-math-word-problems-200k
18
+ - abacusai/SystemChat-1.1
19
+ - Locutusque/function-calling-chatml
20
+ - internlm/Agent-FLAN
21
+ language:
22
+ - en
23
+ ---
24
+
25
+ # Dolphin 2.9 Mixtral 8x22b 🐬
26
+
27
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
28
+
29
+ Discord: https://discord.gg/8fbBeC7ZGx
30
+
31
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
32
+
33
+ My appreciation for the sponsors of Dolphin 2.9:
34
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
35
+
36
+ This model is based on Dolphin-2.9-Mixtral-8x22b, and is Apache-2.0 licensed.
37
+
38
+ The base model has 64k context, and the full-weight fine-tuning was with 4k sequence length.
39
+
40
+ It took 1 week on 8xH100 provided by Crusoe Cloud
41
+
42
+ This model was trained FFT on 50% parameters (targeted with [Laser Scanner](https://github.com/cognitivecomputations/laserRMT/blob/main/laser_scanner.py) by Fernando Fernandes, David Golchinfar, Lucas Atkins, and Eric Hartford) , using ChatML prompt template format.
43
+
44
+ example:
45
+
46
+ ```
47
+ <|im_start|>system
48
+ You are Dolphin, a helpful AI assistant.<|im_end|>
49
+ <|im_start|>user
50
+ {prompt}<|im_end|>
51
+ <|im_start|>assistant
52
+
53
+ ```
54
+
55
+ Dolphin-2.9 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
56
+
57
+ Dolphin is uncensored. I have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
58
+
59
+ Dolphin is licensed Apache 2.0. I grant permission for any use, including commercial, that falls within accordance with Apache-2.0 license. Dolphin was trained on data generated from GPT4, among other models.
60
+
61
+ ## Evals
62
+
63
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/Nb6f_dS_M6fN_v2ACK98x.png)
64
+
65
+ ## Training
66
+
67
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
68
+ <details><summary>See axolotl config</summary>
69
+
70
+ axolotl version: `0.4.0`
71
+ ```yaml
72
+ base_model: mistral-community/Mixtral-8x22B-v0.1
73
+ model_type: AutoModelForCausalLM
74
+ tokenizer_type: LlamaTokenizer
75
+ trust_remote_code: true
76
+
77
+ load_in_8bit: false
78
+ load_in_4bit: false
79
+ strict: false
80
+
81
+ unfrozen_parameters:
82
+ - ^lm_head.weight$
83
+ - ^model.embed_tokens.weight$
84
+ - model.layers.0.self_attn.q_proj
85
+ - model.layers.1.self_attn.q_proj
86
+ - model.layers.2.self_attn.q_proj
87
+ - model.layers.22.self_attn.q_proj
88
+ - model.layers.27.self_attn.q_proj
89
+ - model.layers.28.self_attn.q_proj
90
+ - model.layers.13.self_attn.q_proj
91
+ - model.layers.21.self_attn.q_proj
92
+ - model.layers.24.self_attn.q_proj
93
+ - model.layers.14.self_attn.q_proj
94
+ - model.layers.15.self_attn.q_proj
95
+ - model.layers.11.self_attn.q_proj
96
+ - model.layers.20.self_attn.q_proj
97
+ - model.layers.23.self_attn.q_proj
98
+ - model.layers.30.self_attn.k_proj
99
+ - model.layers.31.self_attn.k_proj
100
+ - model.layers.25.self_attn.k_proj
101
+ - model.layers.23.self_attn.k_proj
102
+ - model.layers.27.self_attn.k_proj
103
+ - model.layers.26.self_attn.k_proj
104
+ - model.layers.29.self_attn.k_proj
105
+ - model.layers.28.self_attn.k_proj
106
+ - model.layers.24.self_attn.k_proj
107
+ - model.layers.16.self_attn.k_proj
108
+ - model.layers.19.self_attn.k_proj
109
+ - model.layers.22.self_attn.k_proj
110
+ - model.layers.20.self_attn.k_proj
111
+ - model.layers.6.self_attn.k_proj
112
+ - model.layers.22.self_attn.v_proj
113
+ - model.layers.29.self_attn.v_proj
114
+ - model.layers.31.self_attn.v_proj
115
+ - model.layers.5.self_attn.v_proj
116
+ - model.layers.8.self_attn.v_proj
117
+ - model.layers.4.self_attn.v_proj
118
+ - model.layers.25.self_attn.v_proj
119
+ - model.layers.30.self_attn.v_proj
120
+ - model.layers.17.self_attn.v_proj
121
+ - model.layers.23.self_attn.v_proj
122
+ - model.layers.28.self_attn.v_proj
123
+ - model.layers.9.self_attn.v_proj
124
+ - model.layers.26.self_attn.v_proj
125
+ - model.layers.27.self_attn.v_proj
126
+ - model.layers.20.self_attn.o_proj
127
+ - model.layers.19.self_attn.o_proj
128
+ - model.layers.16.self_attn.o_proj
129
+ - model.layers.13.self_attn.o_proj
130
+ - model.layers.18.self_attn.o_proj
131
+ - model.layers.17.self_attn.o_proj
132
+ - model.layers.12.self_attn.o_proj
133
+ - model.layers.15.self_attn.o_proj
134
+ - model.layers.14.self_attn.o_proj
135
+ - model.layers.22.self_attn.o_proj
136
+ - model.layers.23.self_attn.o_proj
137
+ - model.layers.21.self_attn.o_proj
138
+ - model.layers.10.self_attn.o_proj
139
+ - model.layers.0.self_attn.o_proj
140
+ - model.layers.0.block_sparse_moe.experts.0.w1
141
+ - model.layers.1.block_sparse_moe.experts.0.w1
142
+ - model.layers.2.block_sparse_moe.experts.0.w1
143
+ - model.layers.3.block_sparse_moe.experts.0.w1
144
+ - model.layers.4.block_sparse_moe.experts.0.w1
145
+ - model.layers.5.block_sparse_moe.experts.0.w1
146
+ - model.layers.6.block_sparse_moe.experts.0.w1
147
+ - model.layers.7.block_sparse_moe.experts.0.w1
148
+ - model.layers.8.block_sparse_moe.experts.0.w1
149
+ - model.layers.9.block_sparse_moe.experts.0.w1
150
+ - model.layers.10.block_sparse_moe.experts.0.w1
151
+ - model.layers.11.block_sparse_moe.experts.0.w1
152
+ - model.layers.12.block_sparse_moe.experts.0.w1
153
+ - model.layers.13.block_sparse_moe.experts.0.w1
154
+ - model.layers.0.block_sparse_moe.experts.0.w2
155
+ - model.layers.1.block_sparse_moe.experts.0.w2
156
+ - model.layers.2.block_sparse_moe.experts.0.w2
157
+ - model.layers.3.block_sparse_moe.experts.0.w2
158
+ - model.layers.4.block_sparse_moe.experts.0.w2
159
+ - model.layers.5.block_sparse_moe.experts.0.w2
160
+ - model.layers.6.block_sparse_moe.experts.0.w2
161
+ - model.layers.7.block_sparse_moe.experts.0.w2
162
+ - model.layers.8.block_sparse_moe.experts.0.w2
163
+ - model.layers.9.block_sparse_moe.experts.0.w2
164
+ - model.layers.10.block_sparse_moe.experts.0.w2
165
+ - model.layers.11.block_sparse_moe.experts.0.w2
166
+ - model.layers.12.block_sparse_moe.experts.0.w2
167
+ - model.layers.13.block_sparse_moe.experts.0.w2
168
+ - model.layers.0.block_sparse_moe.experts.0.w3
169
+ - model.layers.1.block_sparse_moe.experts.0.w3
170
+ - model.layers.2.block_sparse_moe.experts.0.w3
171
+ - model.layers.3.block_sparse_moe.experts.0.w3
172
+ - model.layers.4.block_sparse_moe.experts.0.w3
173
+ - model.layers.5.block_sparse_moe.experts.0.w3
174
+ - model.layers.6.block_sparse_moe.experts.0.w3
175
+ - model.layers.7.block_sparse_moe.experts.0.w3
176
+ - model.layers.8.block_sparse_moe.experts.0.w3
177
+ - model.layers.9.block_sparse_moe.experts.0.w3
178
+ - model.layers.10.block_sparse_moe.experts.0.w3
179
+ - model.layers.11.block_sparse_moe.experts.0.w3
180
+ - model.layers.12.block_sparse_moe.experts.0.w3
181
+ - model.layers.13.block_sparse_moe.experts.0.w3
182
+ - model.layers.0.block_sparse_moe.experts.1.w1
183
+ - model.layers.1.block_sparse_moe.experts.1.w1
184
+ - model.layers.2.block_sparse_moe.experts.1.w1
185
+ - model.layers.3.block_sparse_moe.experts.1.w1
186
+ - model.layers.4.block_sparse_moe.experts.1.w1
187
+ - model.layers.5.block_sparse_moe.experts.1.w1
188
+ - model.layers.6.block_sparse_moe.experts.1.w1
189
+ - model.layers.7.block_sparse_moe.experts.1.w1
190
+ - model.layers.8.block_sparse_moe.experts.1.w1
191
+ - model.layers.9.block_sparse_moe.experts.1.w1
192
+ - model.layers.10.block_sparse_moe.experts.1.w1
193
+ - model.layers.11.block_sparse_moe.experts.1.w1
194
+ - model.layers.12.block_sparse_moe.experts.1.w1
195
+ - model.layers.13.block_sparse_moe.experts.1.w1
196
+ - model.layers.40.block_sparse_moe.experts.1.w2
197
+ - model.layers.0.block_sparse_moe.experts.1.w2
198
+ - model.layers.1.block_sparse_moe.experts.1.w2
199
+ - model.layers.2.block_sparse_moe.experts.1.w2
200
+ - model.layers.3.block_sparse_moe.experts.1.w2
201
+ - model.layers.4.block_sparse_moe.experts.1.w2
202
+ - model.layers.5.block_sparse_moe.experts.1.w2
203
+ - model.layers.6.block_sparse_moe.experts.1.w2
204
+ - model.layers.7.block_sparse_moe.experts.1.w2
205
+ - model.layers.8.block_sparse_moe.experts.1.w2
206
+ - model.layers.9.block_sparse_moe.experts.1.w2
207
+ - model.layers.10.block_sparse_moe.experts.1.w2
208
+ - model.layers.11.block_sparse_moe.experts.1.w2
209
+ - model.layers.12.block_sparse_moe.experts.1.w2
210
+ - model.layers.5.block_sparse_moe.experts.1.w3
211
+ - model.layers.0.block_sparse_moe.experts.1.w3
212
+ - model.layers.1.block_sparse_moe.experts.1.w3
213
+ - model.layers.2.block_sparse_moe.experts.1.w3
214
+ - model.layers.3.block_sparse_moe.experts.1.w3
215
+ - model.layers.4.block_sparse_moe.experts.1.w3
216
+ - model.layers.6.block_sparse_moe.experts.1.w3
217
+ - model.layers.7.block_sparse_moe.experts.1.w3
218
+ - model.layers.8.block_sparse_moe.experts.1.w3
219
+ - model.layers.9.block_sparse_moe.experts.1.w3
220
+ - model.layers.10.block_sparse_moe.experts.1.w3
221
+ - model.layers.11.block_sparse_moe.experts.1.w3
222
+ - model.layers.12.block_sparse_moe.experts.1.w3
223
+ - model.layers.13.block_sparse_moe.experts.1.w3
224
+ - model.layers.1.block_sparse_moe.experts.2.w1
225
+ - model.layers.0.block_sparse_moe.experts.2.w1
226
+ - model.layers.2.block_sparse_moe.experts.2.w1
227
+ - model.layers.3.block_sparse_moe.experts.2.w1
228
+ - model.layers.4.block_sparse_moe.experts.2.w1
229
+ - model.layers.5.block_sparse_moe.experts.2.w1
230
+ - model.layers.6.block_sparse_moe.experts.2.w1
231
+ - model.layers.7.block_sparse_moe.experts.2.w1
232
+ - model.layers.8.block_sparse_moe.experts.2.w1
233
+ - model.layers.9.block_sparse_moe.experts.2.w1
234
+ - model.layers.10.block_sparse_moe.experts.2.w1
235
+ - model.layers.11.block_sparse_moe.experts.2.w1
236
+ - model.layers.12.block_sparse_moe.experts.2.w1
237
+ - model.layers.13.block_sparse_moe.experts.2.w1
238
+ - model.layers.1.block_sparse_moe.experts.2.w2
239
+ - model.layers.0.block_sparse_moe.experts.2.w2
240
+ - model.layers.2.block_sparse_moe.experts.2.w2
241
+ - model.layers.3.block_sparse_moe.experts.2.w2
242
+ - model.layers.4.block_sparse_moe.experts.2.w2
243
+ - model.layers.5.block_sparse_moe.experts.2.w2
244
+ - model.layers.6.block_sparse_moe.experts.2.w2
245
+ - model.layers.7.block_sparse_moe.experts.2.w2
246
+ - model.layers.8.block_sparse_moe.experts.2.w2
247
+ - model.layers.9.block_sparse_moe.experts.2.w2
248
+ - model.layers.10.block_sparse_moe.experts.2.w2
249
+ - model.layers.11.block_sparse_moe.experts.2.w2
250
+ - model.layers.12.block_sparse_moe.experts.2.w2
251
+ - model.layers.13.block_sparse_moe.experts.2.w2
252
+ - model.layers.1.block_sparse_moe.experts.2.w3
253
+ - model.layers.0.block_sparse_moe.experts.2.w3
254
+ - model.layers.2.block_sparse_moe.experts.2.w3
255
+ - model.layers.3.block_sparse_moe.experts.2.w3
256
+ - model.layers.4.block_sparse_moe.experts.2.w3
257
+ - model.layers.5.block_sparse_moe.experts.2.w3
258
+ - model.layers.6.block_sparse_moe.experts.2.w3
259
+ - model.layers.7.block_sparse_moe.experts.2.w3
260
+ - model.layers.8.block_sparse_moe.experts.2.w3
261
+ - model.layers.9.block_sparse_moe.experts.2.w3
262
+ - model.layers.10.block_sparse_moe.experts.2.w3
263
+ - model.layers.11.block_sparse_moe.experts.2.w3
264
+ - model.layers.12.block_sparse_moe.experts.2.w3
265
+ - model.layers.13.block_sparse_moe.experts.2.w3
266
+ - model.layers.2.block_sparse_moe.experts.3.w1
267
+ - model.layers.1.block_sparse_moe.experts.3.w1
268
+ - model.layers.0.block_sparse_moe.experts.3.w1
269
+ - model.layers.3.block_sparse_moe.experts.3.w1
270
+ - model.layers.4.block_sparse_moe.experts.3.w1
271
+ - model.layers.5.block_sparse_moe.experts.3.w1
272
+ - model.layers.6.block_sparse_moe.experts.3.w1
273
+ - model.layers.7.block_sparse_moe.experts.3.w1
274
+ - model.layers.8.block_sparse_moe.experts.3.w1
275
+ - model.layers.9.block_sparse_moe.experts.3.w1
276
+ - model.layers.10.block_sparse_moe.experts.3.w1
277
+ - model.layers.11.block_sparse_moe.experts.3.w1
278
+ - model.layers.12.block_sparse_moe.experts.3.w1
279
+ - model.layers.13.block_sparse_moe.experts.3.w1
280
+ - model.layers.2.block_sparse_moe.experts.3.w2
281
+ - model.layers.1.block_sparse_moe.experts.3.w2
282
+ - model.layers.0.block_sparse_moe.experts.3.w2
283
+ - model.layers.3.block_sparse_moe.experts.3.w2
284
+ - model.layers.4.block_sparse_moe.experts.3.w2
285
+ - model.layers.5.block_sparse_moe.experts.3.w2
286
+ - model.layers.6.block_sparse_moe.experts.3.w2
287
+ - model.layers.7.block_sparse_moe.experts.3.w2
288
+ - model.layers.8.block_sparse_moe.experts.3.w2
289
+ - model.layers.9.block_sparse_moe.experts.3.w2
290
+ - model.layers.10.block_sparse_moe.experts.3.w2
291
+ - model.layers.11.block_sparse_moe.experts.3.w2
292
+ - model.layers.12.block_sparse_moe.experts.3.w2
293
+ - model.layers.13.block_sparse_moe.experts.3.w2
294
+ - model.layers.2.block_sparse_moe.experts.3.w3
295
+ - model.layers.1.block_sparse_moe.experts.3.w3
296
+ - model.layers.0.block_sparse_moe.experts.3.w3
297
+ - model.layers.3.block_sparse_moe.experts.3.w3
298
+ - model.layers.4.block_sparse_moe.experts.3.w3
299
+ - model.layers.5.block_sparse_moe.experts.3.w3
300
+ - model.layers.6.block_sparse_moe.experts.3.w3
301
+ - model.layers.7.block_sparse_moe.experts.3.w3
302
+ - model.layers.8.block_sparse_moe.experts.3.w3
303
+ - model.layers.9.block_sparse_moe.experts.3.w3
304
+ - model.layers.10.block_sparse_moe.experts.3.w3
305
+ - model.layers.11.block_sparse_moe.experts.3.w3
306
+ - model.layers.12.block_sparse_moe.experts.3.w3
307
+ - model.layers.13.block_sparse_moe.experts.3.w3
308
+ - model.layers.3.block_sparse_moe.experts.4.w1
309
+ - model.layers.2.block_sparse_moe.experts.4.w1
310
+ - model.layers.1.block_sparse_moe.experts.4.w1
311
+ - model.layers.0.block_sparse_moe.experts.4.w1
312
+ - model.layers.4.block_sparse_moe.experts.4.w1
313
+ - model.layers.5.block_sparse_moe.experts.4.w1
314
+ - model.layers.6.block_sparse_moe.experts.4.w1
315
+ - model.layers.7.block_sparse_moe.experts.4.w1
316
+ - model.layers.8.block_sparse_moe.experts.4.w1
317
+ - model.layers.9.block_sparse_moe.experts.4.w1
318
+ - model.layers.10.block_sparse_moe.experts.4.w1
319
+ - model.layers.11.block_sparse_moe.experts.4.w1
320
+ - model.layers.12.block_sparse_moe.experts.4.w1
321
+ - model.layers.13.block_sparse_moe.experts.4.w1
322
+ - model.layers.2.block_sparse_moe.experts.4.w2
323
+ - model.layers.3.block_sparse_moe.experts.4.w2
324
+ - model.layers.1.block_sparse_moe.experts.4.w2
325
+ - model.layers.20.block_sparse_moe.experts.4.w2
326
+ - model.layers.0.block_sparse_moe.experts.4.w2
327
+ - model.layers.4.block_sparse_moe.experts.4.w2
328
+ - model.layers.5.block_sparse_moe.experts.4.w2
329
+ - model.layers.6.block_sparse_moe.experts.4.w2
330
+ - model.layers.7.block_sparse_moe.experts.4.w2
331
+ - model.layers.8.block_sparse_moe.experts.4.w2
332
+ - model.layers.9.block_sparse_moe.experts.4.w2
333
+ - model.layers.10.block_sparse_moe.experts.4.w2
334
+ - model.layers.11.block_sparse_moe.experts.4.w2
335
+ - model.layers.12.block_sparse_moe.experts.4.w2
336
+ - model.layers.3.block_sparse_moe.experts.4.w3
337
+ - model.layers.2.block_sparse_moe.experts.4.w3
338
+ - model.layers.1.block_sparse_moe.experts.4.w3
339
+ - model.layers.0.block_sparse_moe.experts.4.w3
340
+ - model.layers.4.block_sparse_moe.experts.4.w3
341
+ - model.layers.5.block_sparse_moe.experts.4.w3
342
+ - model.layers.6.block_sparse_moe.experts.4.w3
343
+ - model.layers.7.block_sparse_moe.experts.4.w3
344
+ - model.layers.8.block_sparse_moe.experts.4.w3
345
+ - model.layers.9.block_sparse_moe.experts.4.w3
346
+ - model.layers.10.block_sparse_moe.experts.4.w3
347
+ - model.layers.11.block_sparse_moe.experts.4.w3
348
+ - model.layers.12.block_sparse_moe.experts.4.w3
349
+ - model.layers.13.block_sparse_moe.experts.4.w3
350
+ - model.layers.4.block_sparse_moe.experts.5.w1
351
+ - model.layers.3.block_sparse_moe.experts.5.w1
352
+ - model.layers.2.block_sparse_moe.experts.5.w1
353
+ - model.layers.1.block_sparse_moe.experts.5.w1
354
+ - model.layers.0.block_sparse_moe.experts.5.w1
355
+ - model.layers.5.block_sparse_moe.experts.5.w1
356
+ - model.layers.6.block_sparse_moe.experts.5.w1
357
+ - model.layers.7.block_sparse_moe.experts.5.w1
358
+ - model.layers.8.block_sparse_moe.experts.5.w1
359
+ - model.layers.9.block_sparse_moe.experts.5.w1
360
+ - model.layers.10.block_sparse_moe.experts.5.w1
361
+ - model.layers.11.block_sparse_moe.experts.5.w1
362
+ - model.layers.12.block_sparse_moe.experts.5.w1
363
+ - model.layers.13.block_sparse_moe.experts.5.w1
364
+ - model.layers.4.block_sparse_moe.experts.5.w2
365
+ - model.layers.2.block_sparse_moe.experts.5.w2
366
+ - model.layers.3.block_sparse_moe.experts.5.w2
367
+ - model.layers.1.block_sparse_moe.experts.5.w2
368
+ - model.layers.0.block_sparse_moe.experts.5.w2
369
+ - model.layers.5.block_sparse_moe.experts.5.w2
370
+ - model.layers.6.block_sparse_moe.experts.5.w2
371
+ - model.layers.7.block_sparse_moe.experts.5.w2
372
+ - model.layers.8.block_sparse_moe.experts.5.w2
373
+ - model.layers.9.block_sparse_moe.experts.5.w2
374
+ - model.layers.10.block_sparse_moe.experts.5.w2
375
+ - model.layers.11.block_sparse_moe.experts.5.w2
376
+ - model.layers.12.block_sparse_moe.experts.5.w2
377
+ - model.layers.13.block_sparse_moe.experts.5.w2
378
+ - model.layers.4.block_sparse_moe.experts.5.w3
379
+ - model.layers.3.block_sparse_moe.experts.5.w3
380
+ - model.layers.2.block_sparse_moe.experts.5.w3
381
+ - model.layers.1.block_sparse_moe.experts.5.w3
382
+ - model.layers.0.block_sparse_moe.experts.5.w3
383
+ - model.layers.5.block_sparse_moe.experts.5.w3
384
+ - model.layers.6.block_sparse_moe.experts.5.w3
385
+ - model.layers.7.block_sparse_moe.experts.5.w3
386
+ - model.layers.8.block_sparse_moe.experts.5.w3
387
+ - model.layers.9.block_sparse_moe.experts.5.w3
388
+ - model.layers.10.block_sparse_moe.experts.5.w3
389
+ - model.layers.11.block_sparse_moe.experts.5.w3
390
+ - model.layers.12.block_sparse_moe.experts.5.w3
391
+ - model.layers.13.block_sparse_moe.experts.5.w3
392
+ - model.layers.5.block_sparse_moe.experts.6.w1
393
+ - model.layers.4.block_sparse_moe.experts.6.w1
394
+ - model.layers.3.block_sparse_moe.experts.6.w1
395
+ - model.layers.2.block_sparse_moe.experts.6.w1
396
+ - model.layers.1.block_sparse_moe.experts.6.w1
397
+ - model.layers.0.block_sparse_moe.experts.6.w1
398
+ - model.layers.6.block_sparse_moe.experts.6.w1
399
+ - model.layers.7.block_sparse_moe.experts.6.w1
400
+ - model.layers.8.block_sparse_moe.experts.6.w1
401
+ - model.layers.9.block_sparse_moe.experts.6.w1
402
+ - model.layers.10.block_sparse_moe.experts.6.w1
403
+ - model.layers.11.block_sparse_moe.experts.6.w1
404
+ - model.layers.12.block_sparse_moe.experts.6.w1
405
+ - model.layers.13.block_sparse_moe.experts.6.w1
406
+ - model.layers.5.block_sparse_moe.experts.6.w2
407
+ - model.layers.4.block_sparse_moe.experts.6.w2
408
+ - model.layers.2.block_sparse_moe.experts.6.w2
409
+ - model.layers.3.block_sparse_moe.experts.6.w2
410
+ - model.layers.1.block_sparse_moe.experts.6.w2
411
+ - model.layers.0.block_sparse_moe.experts.6.w2
412
+ - model.layers.6.block_sparse_moe.experts.6.w2
413
+ - model.layers.7.block_sparse_moe.experts.6.w2
414
+ - model.layers.8.block_sparse_moe.experts.6.w2
415
+ - model.layers.9.block_sparse_moe.experts.6.w2
416
+ - model.layers.10.block_sparse_moe.experts.6.w2
417
+ - model.layers.11.block_sparse_moe.experts.6.w2
418
+ - model.layers.12.block_sparse_moe.experts.6.w2
419
+ - model.layers.13.block_sparse_moe.experts.6.w2
420
+ - model.layers.5.block_sparse_moe.experts.6.w3
421
+ - model.layers.4.block_sparse_moe.experts.6.w3
422
+ - model.layers.3.block_sparse_moe.experts.6.w3
423
+ - model.layers.2.block_sparse_moe.experts.6.w3
424
+ - model.layers.1.block_sparse_moe.experts.6.w3
425
+ - model.layers.0.block_sparse_moe.experts.6.w3
426
+ - model.layers.6.block_sparse_moe.experts.6.w3
427
+ - model.layers.7.block_sparse_moe.experts.6.w3
428
+ - model.layers.8.block_sparse_moe.experts.6.w3
429
+ - model.layers.9.block_sparse_moe.experts.6.w3
430
+ - model.layers.10.block_sparse_moe.experts.6.w3
431
+ - model.layers.11.block_sparse_moe.experts.6.w3
432
+ - model.layers.12.block_sparse_moe.experts.6.w3
433
+ - model.layers.13.block_sparse_moe.experts.6.w3
434
+ - model.layers.5.block_sparse_moe.experts.7.w1
435
+ - model.layers.6.block_sparse_moe.experts.7.w1
436
+ - model.layers.3.block_sparse_moe.experts.7.w1
437
+ - model.layers.4.block_sparse_moe.experts.7.w1
438
+ - model.layers.2.block_sparse_moe.experts.7.w1
439
+ - model.layers.0.block_sparse_moe.experts.7.w1
440
+ - model.layers.7.block_sparse_moe.experts.7.w1
441
+ - model.layers.8.block_sparse_moe.experts.7.w1
442
+ - model.layers.9.block_sparse_moe.experts.7.w1
443
+ - model.layers.10.block_sparse_moe.experts.7.w1
444
+ - model.layers.11.block_sparse_moe.experts.7.w1
445
+ - model.layers.12.block_sparse_moe.experts.7.w1
446
+ - model.layers.13.block_sparse_moe.experts.7.w1
447
+ - model.layers.14.block_sparse_moe.experts.7.w1
448
+ - model.layers.6.block_sparse_moe.experts.7.w2
449
+ - model.layers.5.block_sparse_moe.experts.7.w2
450
+ - model.layers.4.block_sparse_moe.experts.7.w2
451
+ - model.layers.2.block_sparse_moe.experts.7.w2
452
+ - model.layers.3.block_sparse_moe.experts.7.w2
453
+ - model.layers.1.block_sparse_moe.experts.7.w2
454
+ - model.layers.0.block_sparse_moe.experts.7.w2
455
+ - model.layers.7.block_sparse_moe.experts.7.w2
456
+ - model.layers.8.block_sparse_moe.experts.7.w2
457
+ - model.layers.9.block_sparse_moe.experts.7.w2
458
+ - model.layers.10.block_sparse_moe.experts.7.w2
459
+ - model.layers.11.block_sparse_moe.experts.7.w2
460
+ - model.layers.12.block_sparse_moe.experts.7.w2
461
+ - model.layers.13.block_sparse_moe.experts.7.w2
462
+ - model.layers.6.block_sparse_moe.experts.7.w3
463
+ - model.layers.5.block_sparse_moe.experts.7.w3
464
+ - model.layers.4.block_sparse_moe.experts.7.w3
465
+ - model.layers.3.block_sparse_moe.experts.7.w3
466
+ - model.layers.2.block_sparse_moe.experts.7.w3
467
+ - model.layers.0.block_sparse_moe.experts.7.w3
468
+ - model.layers.7.block_sparse_moe.experts.7.w3
469
+ - model.layers.8.block_sparse_moe.experts.7.w3
470
+ - model.layers.9.block_sparse_moe.experts.7.w3
471
+ - model.layers.10.block_sparse_moe.experts.7.w3
472
+ - model.layers.11.block_sparse_moe.experts.7.w3
473
+ - model.layers.12.block_sparse_moe.experts.7.w3
474
+ - model.layers.13.block_sparse_moe.experts.7.w3
475
+ - model.layers.14.block_sparse_moe.experts.7.w3
476
+ - model.layers.0.block_sparse_moe.gate
477
+ - model.layers.1.block_sparse_moe.gate
478
+ - model.layers.2.block_sparse_moe.gate
479
+ - model.layers.3.block_sparse_moe.gate
480
+ - model.layers.4.block_sparse_moe.gate
481
+ - model.layers.5.block_sparse_moe.gate
482
+ - model.layers.6.block_sparse_moe.gate
483
+ - model.layers.7.block_sparse_moe.gate
484
+ - model.layers.8.block_sparse_moe.gate
485
+ - model.layers.9.block_sparse_moe.gate
486
+ - model.layers.10.block_sparse_moe.gate
487
+ - model.layers.11.block_sparse_moe.gate
488
+ - model.layers.12.block_sparse_moe.gate
489
+ - model.layers.13.block_sparse_moe.gate
490
+
491
+ model_config:
492
+ output_router_logits: true
493
+
494
+ datasets:
495
+ - path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
496
+ type: sharegpt
497
+ conversation: chatml
498
+ - path: /workspace/datasets/dolphin-2.9/Ultrachat200kunfiltered.jsonl
499
+ type: sharegpt
500
+ conversation: chatml
501
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
502
+ type: sharegpt
503
+ conversation: chatml
504
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
505
+ type: sharegpt
506
+ conversation: chatml
507
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
508
+ type: sharegpt
509
+ conversation: chatml
510
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
511
+ type: sharegpt
512
+ conversation: chatml
513
+ - path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
514
+ type: sharegpt
515
+ conversation: chatml
516
+ - path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
517
+ type: sharegpt
518
+ conversation: chatml
519
+ - path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
520
+ type: sharegpt
521
+ conversation: chatml
522
+ - path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
523
+ type: sharegpt
524
+ conversation: chatml
525
+ - path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
526
+ type: sharegpt
527
+ conversation: chatml
528
+ - path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
529
+ type: sharegpt
530
+ conversation: chatml
531
+ - path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
532
+ type: sharegpt
533
+ conversation: chatml
534
+ - path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
535
+ type: sharegpt
536
+ conversation: chatml
537
+ - path: /workspace/datasets/dolphin-2.9/SystemConversations.jsonl
538
+ type: sharegpt
539
+ conversation: chatml
540
+
541
+ chat_template: chatml
542
+
543
+ dataset_prepared_path: thingy
544
+ val_set_size: 0.0002
545
+ output_dir: ./out
546
+
547
+ sequence_len: 4096
548
+ sample_packing: true
549
+ pad_to_sequence_len: true
550
+
551
+ gradient_accumulation_steps: 8
552
+ micro_batch_size: 4
553
+ num_epochs: 3
554
+ logging_steps: 1
555
+ optimizer: paged_adamw_8bit
556
+ lr_scheduler: cosine
557
+ learning_rate: 2.7e-5
558
+
559
+ wandb_project: dolphin-2.9-mixtral-8x22b
560
+ wandb_watch:
561
+ wandb_run_id:
562
+ wandb_log_model:
563
+
564
+ train_on_inputs: false
565
+ group_by_length: false
566
+ bf16: auto
567
+ fp16:
568
+ tf32: true
569
+
570
+ gradient_checkpointing: true
571
+ gradient_checkpointing_kwargs:
572
+ use_reentrant: false
573
+ early_stopping_patience:
574
+ # resume_from_checkpoint: /home/ehartford/axolotl/out/checkpoint-316
575
+ local_rank:
576
+ logging_steps: 1
577
+ xformers_attention:
578
+ flash_attention: true
579
+ saves_per_epoch: 8
580
+ save_total_limit: 2
581
+ save_steps:
582
+ evals_per_epoch: 4
583
+ eval_sample_packing: false
584
+ debug:
585
+ deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_params.json
586
+ weight_decay: 0.05
587
+ fsdp:
588
+ fsdp_config:
589
+ special_tokens:
590
+ eos_token: "<|im_end|>"
591
+ tokens:
592
+ - "<|im_start|>"
593
+ ```
594
+
595
+ </details><br>
596
+
597
+ ### Training results
598
+
599
+ | Training Loss | Epoch | Step | Validation Loss |
600
+ |:-------------:|:-----:|:----:|:---------------:|
601
+ | 0.7022 | 0.0 | 1 | 0.6989 |
602
+ | 0.5344 | 0.25 | 238 | 0.5138 |
603
+ | 0.5204 | 0.5 | 476 | 0.5018 |
604
+ | 0.5059 | 0.75 | 714 | 0.4951 |
605
+ | 0.5112 | 1.0 | 952 | 0.4911 |
606
+ | 0.4561 | 1.24 | 1190 | 0.4978 |
607
+ | 0.478 | 1.49 | 1428 | 0.4935 |
608
+ | 0.4714 | 1.74 | 1666 | 0.4899 |
609
+ | 0.4626 | 1.99 | 1904 | 0.4861 |
610
+ | 0.3675 | 2.22 | 2142 | 0.5240 |
611
+ | 0.3595 | 2.47 | 2380 | 0.5229 |
612
+ | 0.3438 | 2.72 | 2618 | 0.5217 |
613
+
614
+
615
+ ### Framework versions
616
+
617
+ - Transformers 4.40.0.dev0
618
+ - Pytorch 2.2.2+cu121
619
+ - Datasets 2.15.0
620
+ - Tokenizers 0.15.0
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "<|im_end|>": 32000,
3
+ "<|im_start|>": 32001
4
+ }
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mistral-community/Mixtral-8x22B-v0.1",
3
+ "architectures": [
4
+ "MixtralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 32000,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 6144,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 16384,
13
+ "max_position_embeddings": 65536,
14
+ "model_type": "mixtral",
15
+ "num_attention_heads": 48,
16
+ "num_experts_per_tok": 2,
17
+ "num_hidden_layers": 56,
18
+ "num_key_value_heads": 8,
19
+ "num_local_experts": 8,
20
+ "output_router_logits": true,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_theta": 1000000,
23
+ "router_aux_loss_coef": 0.001,
24
+ "router_jitter_noise": 0.0,
25
+ "sliding_window": null,
26
+ "tie_word_embeddings": false,
27
+ "torch_dtype": "bfloat16",
28
+ "transformers_version": "4.40.0.dev0",
29
+ "use_cache": false,
30
+ "vocab_size": 32002,
31
+ "quantization_config": {
32
+ "quant_method": "exl2",
33
+ "version": "0.0.19",
34
+ "bits": 2.5,
35
+ "head_bits": 6,
36
+ "calibration": {
37
+ "rows": 100,
38
+ "length": 2048,
39
+ "dataset": "(default)"
40
+ }
41
+ }
42
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 2,
6
+ "transformers_version": "4.40.0.dev0"
7
+ }
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
output-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c26489622fc82d96dc36412ff5fd3cb9fc8f526567df4a4354a51d3bb3b3ae18
3
+ size 8581408160
output-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b973fb384a25d53513c2490b94f3b46bada690ba4c48737ac03a7efe20ef740
3
+ size 8573781384
output-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb7f92a4b1c73b6faf2b405918196cfe7d0fa3d4862da847b0db8e78cb115b40
3
+ size 8567156456
output-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:560e10c8f84bd14ce7386a14677b96eff8ca0dda08864755f445d5bc88847fc2
3
+ size 8590043624
output-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6b73ba1f22ec2b490c3f2d9dd30677a96085e94d54832733b5c44bf7f8676fd
3
+ size 8585760984
output-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0720651ffe68ea17ae84bfcbc5a476f9227c4019792dc1f4c414a262ff89a8f7
3
+ size 1474282824
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|im_end|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "32000": {
31
+ "content": "<|im_end|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "32001": {
39
+ "content": "<|im_start|>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": false
45
+ }
46
+ },
47
+ "additional_special_tokens": [],
48
+ "bos_token": "<s>",
49
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
50
+ "clean_up_tokenization_spaces": false,
51
+ "eos_token": "<|im_end|>",
52
+ "legacy": true,
53
+ "model_max_length": 1000000000000000019884624838656,
54
+ "pad_token": "</s>",
55
+ "sp_model_kwargs": {},
56
+ "spaces_between_special_tokens": false,
57
+ "tokenizer_class": "LlamaTokenizer",
58
+ "unk_token": "<unk>",
59
+ "use_default_system_prompt": false,
60
+ "use_fast": true
61
+ }