Delta-Vector commited on
Commit
8712414
·
verified ·
1 Parent(s): 009dcc3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +346 -20
README.md CHANGED
@@ -1,35 +1,361 @@
1
  ---
 
 
 
 
2
  base_model:
3
- - unsloth/phi-4
4
- - NewEden/phi4-pt-v2-out-r1
5
- library_name: transformers
6
  tags:
7
- - mergekit
8
- - merge
9
-
 
10
  ---
11
- # phi-pretrain-v3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
14
 
15
- ## Merge Details
16
- ### Merge Method
17
 
18
- This model was merged using the Passthrough merge method using [unsloth/phi-4](https://huggingface.co/unsloth/phi-4) + [NewEden/phi4-pt-v2-out-r1](https://huggingface.co/NewEden/phi4-pt-v2-out-r1) as a base.
 
 
 
 
 
19
 
20
- ### Models Merged
21
 
22
- The following models were included in the merge:
23
 
 
24
 
25
- ### Configuration
26
 
27
- The following YAML configuration was used to produce this model:
28
 
 
 
29
  ```yaml
30
- base_model: unsloth/phi-4+NewEden/phi4-pt-v2-out-r1
31
- dtype: bfloat16
32
- merge_method: passthrough
33
- models:
34
- - model: unsloth/phi-4+NewEden/phi4-pt-v2-out-r1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - NewEden/Orion-LIT
4
+ - NewEden/Orion-Asstr-Stories-16K
5
+ - Mielikki/Erebus-87k
6
  base_model:
7
+ - Delta-Vector/Hamanasu-15B-R1-PT
 
 
8
  tags:
9
+ - phi
10
+ - roleplay
11
+ - finetune
12
+ - storywriting
13
  ---
14
+ <!DOCTYPE html>
15
+ <style>
16
+ html, body {
17
+ background: black;
18
+ color: #c9d1d9 !important;
19
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
20
+ margin: 0;
21
+ padding: 0;
22
+ min-height: 100vh;
23
+ }
24
+ .markdown-body {
25
+ color: white;
26
+ margin: 40px auto;
27
+ padding: 40px;
28
+ border-radius: 12px;
29
+ position: relative;
30
+ overflow: hidden;
31
+ }
32
+
33
+ .markdown-body::after {
34
+ content: '';
35
+ position: absolute;
36
+ top: 0;
37
+ left: 0;
38
+ width: 100%;
39
+ height: 100%;
40
+ background: #0c0f18; /* background color */
41
+ pointer-events: none;
42
+ z-index: -999;
43
+ }
44
+
45
+ h1, h2, h3 {
46
+ background: linear-gradient(45deg, #6e00ff, #00ffff);
47
+ -webkit-background-clip: text;
48
+ -webkit-text-fill-color: transparent;
49
+ border-bottom: 1px solid #333;
50
+ padding-bottom: 0.3em;
51
+ }
52
+
53
+ div[style*="border:2px solid #333"],
54
+ div[style*="border: 2px solid #333"],
55
+ div[style*="border:1px solid #333"],
56
+ div[style*="border: 1px solid #333"] {
57
+ background: rgba(22, 27, 34, 0.8) !important;
58
+ border: 2px solid #6e00ff !important;
59
+ box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);
60
+ border-radius: 10px;
61
+ padding: 20px;
62
+ margin: 20px 0;
63
+ }
64
+
65
+ code {
66
+ background-color: #1a1a1a !important;
67
+ border-radius: 4px;
68
+ padding: 0.2em 0.4em;
69
+ color: #00ffff;
70
+ }
71
+
72
+ pre {
73
+ background-color: #1a1a1a !important;
74
+ border: 1px solid #333;
75
+ border-radius: 8px;
76
+ padding: 16px;
77
+ }
78
+
79
+ table {
80
+ width: 100%;
81
+ border-collapse: collapse;
82
+ margin: 20px 0;
83
+ background: rgba(0,0,0,0.2);
84
+ table-layout: fixed;
85
+ color: white;
86
+ }
87
+
88
+ th, td {
89
+ border: 1px solid #333;
90
+ padding: 12px;
91
+ text-align: center;
92
+ color: white;
93
+ }
94
+
95
+ th {
96
+ background: rgba(110, 0, 255, 0.1);
97
+ }
98
+
99
+ td:nth-child(1) {
100
+ width: 1%;
101
+ white-space: nowrap;
102
+ }
103
+
104
+ td:nth-child(2) {
105
+ width: 100%;
106
+ }
107
+
108
+ td > span {
109
+ display: block;
110
+ padding: 4px 8px;
111
+ background: rgba(110, 0, 255, 0.1);
112
+ border-radius: 4px;
113
+ transition: all 0.3s ease;
114
+ }
115
+
116
+ td > span:hover {
117
+ background: rgba(110, 0, 255, 0.2);
118
+ transform: translateY(-1px);
119
+ }
120
+
121
+ a {
122
+ color: #00ffff;
123
+ text-decoration: none;
124
+ transition: all 0.3s ease;
125
+ }
126
+
127
+ a:hover {
128
+ color: #6e00ff;
129
+ text-decoration: none;
130
+ }
131
+
132
+ hr {
133
+ border: 0;
134
+ height: 1px;
135
+ background: linear-gradient(90deg, transparent, #333, transparent);
136
+ margin: 40px 0;
137
+ }
138
+
139
+ img {
140
+ max-width: 100%;
141
+ border-radius: 10px;
142
+ }
143
+
144
+ details summary:hover {
145
+ color: #00ffff;
146
+ }
147
+
148
+ * {
149
+ color-scheme: dark !important;
150
+ }
151
+
152
+ .prose, .max-w-none, .px-4 {
153
+ background-color: transparent !important;
154
+ color: #c9d1d9 !important;
155
+ }
156
+ </style>
157
+ <body>
158
+ <div class="markdown-body">
159
+ <div align="center">
160
+
161
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/66c26b6fb01b19d8c3c2467b/o5WjJKA9f95ri9UzRxZQE.png" alt="Model Visualization" width="500px" style="border: 3px solid #333; box-shadow: 0 0 15px rgba(66, 0, 131, 0.5);" />
162
+
163
+ <br>
164
+ <br>
165
+
166
+ <div style="font-size:1.5em; font-weight:bold; background: linear-gradient(45deg, #6e00ff, #00ffff); -webkit-background-clip: text; -webkit-text-fill-color: transparent;">
167
+ Hamanasu 15B R2 PT
168
+ </div>
169
+
170
+ </div>
171
+
172
+ <div style="border:1px solid #333; border-radius:10px; padding:20px; margin:20px 0; background: rgba(0,0,0,0.4);">
173
+
174
+ ## 🌌 Overview
175
+
176
+ <i>This is the 2nd pretrain of Phi-4 Contuined from the Orginal Asstr-Erebus Pretrain. This pretrain used 500 million tokens from</i>
177
+
178
+ - `NewEden/Orion-LIT`
179
+
180
+ <i>This model has *not* be instruct tuned, Ablities to converse may be reduced from the original model, If you would like to roleplay, Please use the Instruct version.</i>
181
 
182
+ </div>
183
 
184
+ <div style="border:2px solid #333; border-radius:10px; padding:20px; background: rgba(0,0,0,0.2);">
 
185
 
186
+ ### ⚔️ Hardware
187
+ - 4x RTX 3090 GPUs
188
+ - Epochs: 1
189
+ - Base: `Hamanasu-15B-R1-PT`
190
+ - Amount of Tokens: 500 Million
191
+ </div>
192
 
 
193
 
 
194
 
195
+ </div>
196
 
197
+ <div style="border: 2px solid #6e00ff; border-radius: 10px; padding: 20px; margin: 20px 0; box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);">
198
 
199
+ ## Axolotl Config ꒰(˶• •˶)꒱
200
 
201
+ <details>
202
+
203
  ```yaml
204
+ base_model: Hamanasu-15B-R2-PT
205
+ model_type: AutoModelForCausalLM
206
+ tokenizer_type: AutoTokenizer
207
+
208
+ #hub_model_id: NewEden/Phi4-pretrain
209
+ #hub_strategy: "all_checkpoints"
210
+ #push_dataset_to_hub:
211
+ #hf_use_auth_token: true
212
+
213
+ plugins:
214
+ - axolotl.integrations.liger.LigerPlugin
215
+ liger_rope: true
216
+ liger_rms_norm: true
217
+ liger_swiglu: true
218
+ liger_fused_linear_cross_entropy: true
219
+
220
+ #plugins:
221
+ # - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
222
+
223
+ #cut_cross_entropy: true
224
+
225
+ load_in_8bit: false
226
+ load_in_4bit: false
227
+ strict: false
228
+
229
+ datasets:
230
+ - path: NewEden/Orion-LIT
231
+ type: completion
232
+ field: text
233
+ shuffle_merged_datasets: true
234
+ dataset_prepared_path: prepared_data
235
+ val_set_size: 0.0
236
+ output_dir: ./phi4-ptv2-out-r1
237
+
238
+ sequence_len: 16384
239
+ sample_packing: true
240
+ pad_to_sequence_len: true
241
+
242
+ adapter: lora
243
+ lora_model_dir:
244
+ lora_r: 128
245
+ lora_alpha: 16
246
+ lora_dropout: 0.05
247
+ lora_target_modules:
248
+ - gate_proj
249
+ - down_proj
250
+ - up_proj
251
+ - q_proj
252
+ - v_proj
253
+ - k_proj
254
+ - o_proj
255
+
256
+ lora_modules_to_save:
257
+ - embed_tokens
258
+ - lm_head
259
+
260
+
261
+ wandb_project: mag-phi
262
+ wandb_entity:
263
+ wandb_watch:
264
+ wandb_name: comp-v2-attempt-01
265
+ wandb_log_model:
266
+
267
+ gradient_accumulation_steps: 4
268
+ micro_batch_size: 2
269
+ num_epochs: 1
270
+ optimizer: paged_ademamix_8bit
271
+ lr_scheduler: cosine
272
+ learning_rate: 0.00002
273
+
274
+ train_on_inputs: false
275
+ group_by_length: false
276
+ bf16: auto
277
+ fp16:
278
+ tf32: false
279
+
280
+ gradient_checkpointing: unsloth
281
+ early_stopping_patience:
282
+ resume_from_checkpoint:
283
+ local_rank:
284
+ logging_steps: 1
285
+ xformers_attention:
286
+ flash_attention: true
287
+
288
+ warmup_steps: 15
289
+ evals_per_epoch: 4
290
+ eval_table_size:
291
+ eval_max_new_tokens: 128
292
+ saves_per_epoch: 4
293
+ debug:
294
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
295
+ weight_decay: 0.01
296
+ fsdp:
297
+ fsdp_config:
298
+
299
  ```
300
+
301
+ </details>
302
+ </div>
303
+
304
+
305
+ <div align="center">
306
+
307
+ <div style="border: 2px solid #6e00ff; border-radius: 10px; padding: 20px; margin: 20px 0; box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);">
308
+
309
+ ## ⚡ Credits
310
+ <div style="display: flex; justify-content: center;">
311
+ <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 10px; margin: 20px 0; max-width: 600px;">
312
+
313
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
314
+ <a href="https://huggingface.co/lucyknada">
315
+ <img src="https://img.shields.io/badge/%F0%9F%8C%9F-Lucy_Knada-blueviolet" alt="Lucy Knada">
316
+ </a>
317
+ </div>
318
+
319
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
320
+ <a href="https://huggingface.co/jeiku">
321
+ <img src="https://img.shields.io/badge/%E2%9A%94%EF%B8%8F-jeiku-blueviolet" alt="jeiku">
322
+ </a>
323
+ </div>
324
+
325
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
326
+ <a href="https://huggingface.co/intervitens">
327
+ <img src="https://img.shields.io/badge/%F0%9F%9B%A1%EF%B8%8F-Intervitens-blueviolet" alt="Intervitens">
328
+ </a>
329
+ </div>
330
+
331
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
332
+ <a href="https://huggingface.co/kalomaze">
333
+ <img src="https://img.shields.io/badge/%F0%9F%94%AE-Kalomaze-blueviolet" alt="Kalomaze">
334
+ </a>
335
+ </div>
336
+
337
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
338
+ <a href="https://huggingface.co/kubernetes-bad">
339
+ <img src="https://img.shields.io/badge/%E2%9A%A1-Kubernetes_Bad-blueviolet" alt="Kubernetes Bad">
340
+ </a>
341
+ </div>
342
+
343
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
344
+ <a href="https://huggingface.co/anthracite-org">
345
+ <img src="https://img.shields.io/badge/%F0%9F%8C%91-Anthracite-blueviolet" alt="Anthracite">
346
+ </a>
347
+ </div>
348
+ </div>
349
+ </div>
350
+ </div>
351
+
352
+ ---
353
+
354
+ <div align="center">
355
+ <div style="font-size:0.8em; opacity:0.8;">Made by</div>
356
+ <div style="font-size:1.2em; font-weight:bold; background: linear-gradient(45deg, #6e00ff, #00ffff); -webkit-background-clip: text; -webkit-text-fill-color: transparent;">Delta-Vector</div>
357
+ </div>
358
+
359
+ </div>
360
+ </body>
361
+ </html>