kweinmeister commited on
Commit
33207c3
·
verified ·
1 Parent(s): 435fa12

End of training

Browse files
Files changed (2) hide show
  1. README.md +306 -0
  2. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,306 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: gemma
4
+ base_model: google/gemma-2-27b-it
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - databricks/databricks-dolly-15k
10
+ model-index:
11
+ - name: gemma-2-27b-it-dolly-15k
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
+ <details><summary>See axolotl config</summary>
20
+
21
+ axolotl version: `0.6.0`
22
+ ```yaml
23
+ # base_model: meta-llama/Llama-3.2-1B-Instruct
24
+ # # Automatically upload checkpoint and final model to HF
25
+ # # hub_model_id: kweinmeister/Llama-3.2-1B-Instruct-MetaMathQA
26
+ # hub_model_id: kweinmeister/Llama-3.2-1B-Instruct-gsm8k
27
+
28
+ # load_in_8bit: false
29
+ # load_in_4bit: true
30
+ # strict: false
31
+
32
+
33
+ # datasets:
34
+ # - path: openai/gsm8k
35
+ # type: alpaca_chat.load_qa
36
+ # name: "main"
37
+ # train_on_split: "train"
38
+
39
+
40
+ # # datasets:
41
+ # # - path: meta-math/MetaMathQA
42
+ # # type:
43
+ # # field_instruction: query
44
+ # # field_output: response
45
+
46
+ # val_set_size: 0.1
47
+ # # output_dir: "/mnt/disks/gcs/axolotl/outputs/out"
48
+ # output_dir: "/mnt/disks/gcs/axolotl/outputs/gsm8k-out"
49
+ # # output_dir: "/mnt/disks/gcs/axolotl/outputs/MetaMathQA-out"
50
+
51
+ # adapter: qlora
52
+ # lora_model_dir:
53
+
54
+ # sequence_len: 2048
55
+ # sample_packing: true
56
+ # eval_sample_packing: true
57
+ # pad_to_sequence_len: true
58
+
59
+ # lora_r: 32
60
+ # lora_alpha: 16
61
+ # lora_dropout: 0.05
62
+ # lora_fan_in_fan_out:
63
+ # lora_target_modules:
64
+ # - gate_proj
65
+ # - down_proj
66
+ # - up_proj
67
+ # - q_proj
68
+ # - v_proj
69
+ # - k_proj
70
+ # - o_proj
71
+
72
+ # wandb_project:
73
+ # wandb_entity:
74
+ # wandb_watch:
75
+ # wandb_name:
76
+ # wandb_log_model:
77
+
78
+ # gradient_accumulation_steps: 4
79
+ # micro_batch_size: 2
80
+ # num_epochs: 3
81
+ # # optimizer: adamw_bnb_8bit
82
+ # optimizer: adamw_torch
83
+ # lr_scheduler: cosine
84
+ # learning_rate: 2e-5
85
+
86
+ # train_on_inputs: false
87
+ # group_by_length: false
88
+ # bf16: auto
89
+ # fp16:
90
+ # tf32: false
91
+
92
+ # # gradient_checkpointing: true
93
+ # gradient_checkpointing: false
94
+ # early_stopping_patience:
95
+ # resume_from_checkpoint:
96
+ # local_rank:
97
+ # logging_steps: 1
98
+ # xformers_attention:
99
+ # flash_attention: true
100
+
101
+ # loss_watchdog_threshold: 5.0
102
+ # loss_watchdog_patience: 3
103
+
104
+ # warmup_steps: 10
105
+ # evals_per_epoch: 4
106
+ # eval_table_size:
107
+ # eval_max_new_tokens: 128
108
+ # saves_per_epoch: 1
109
+ # debug:
110
+ # deepspeed:
111
+ # weight_decay: 0.0
112
+ # # fsdp:
113
+ # # fsdp_config:
114
+ # fsdp:
115
+ # - full_shard
116
+ # - auto_wrap
117
+ # fsdp_config:
118
+ # fsdp_limit_all_gathers: true
119
+ # fsdp_sync_module_states: true
120
+ # fsdp_offload_params: true
121
+ # fsdp_use_orig_params: false
122
+ # fsdp_cpu_ram_efficient_loading: true
123
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
124
+ # fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
125
+ # fsdp_state_dict_type: FULL_STATE_DICT
126
+ # fsdp_sharding_strategy: FULL_SHARD
127
+ # fsdp_activation_checkpointing: true
128
+ # special_tokens:
129
+ # # pad_token: "<|end_of_text|>"
130
+ # special_tokens:
131
+ # bos_token: "<|begin_of_text|>"
132
+ # eos_token: "<|eot_id|>"
133
+ # pad_token: "<|finetune_right_pad_id|>"
134
+
135
+ base_model: google/gemma-2-27b-it
136
+ # model_type: AutoModelForCausalLM
137
+ # tokenizer_type: AutoTokenizer
138
+ hub_model_id: kweinmeister/gemma-2-27b-it-dolly-15k
139
+
140
+ load_in_8bit: false
141
+ load_in_4bit: true
142
+ strict: false
143
+
144
+ datasets:
145
+ - path: databricks/databricks-dolly-15k
146
+ type:
147
+ field_instruction: instruction
148
+ field_input: context
149
+ field_output: response
150
+
151
+ val_set_size: 0.1
152
+ output_dir: "/mnt/disks/gcs/axolotl/outputs/dolly-15k-out"
153
+
154
+ adapter: qlora
155
+
156
+ lora_r: 32
157
+ lora_alpha: 16
158
+ lora_dropout: 0.05
159
+ lora_target_linear: true
160
+
161
+ sequence_len: 2048
162
+ sample_packing: true
163
+ # eval_sample_packing: true
164
+ pad_to_sequence_len: true
165
+
166
+ gradient_accumulation_steps: 4
167
+ micro_batch_size: 2
168
+ num_epochs: 3
169
+ # optimizer: adamw_bnb_8bit
170
+ optimizer: adamw_torch
171
+ lr_scheduler: cosine
172
+ learning_rate: 2e-5
173
+
174
+
175
+ train_on_inputs: false
176
+ group_by_length: false
177
+ bf16: auto
178
+ fp16:
179
+ tf32: false
180
+
181
+
182
+ # gradient_checkpointing: false
183
+ gradient_checkpointing: true
184
+ early_stopping_patience:
185
+ resume_from_checkpoint:
186
+ local_rank:
187
+ logging_steps: 1
188
+ xformers_attention:
189
+ flash_attention: false
190
+
191
+ # loss_watchdog_threshold: 5.0
192
+ # loss_watchdog_patience: 3
193
+
194
+
195
+ warmup_ratio: 0.1
196
+ evals_per_epoch: 4
197
+ eval_max_new_tokens: 128
198
+ saves_per_epoch: 1
199
+ debug:
200
+ # deepspeed:
201
+ weight_decay: 0.0
202
+
203
+ deepspeed: deepspeed_configs/zero1.json
204
+
205
+ fsdp:
206
+ fsdp_config:
207
+ # fsdp:
208
+ # - full_shard
209
+ # - auto_wrap
210
+
211
+ # fsdp_config:
212
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
213
+ # fsdp_backward_prefetch: BACKWARD_PRE
214
+ # fsdp_cpu_ram_efficient_loading: true
215
+ # fsdp_forward_prefetch: false
216
+ # fsdp_offload_params: true
217
+ # fsdp_sharding_strategy: FULL_SHARD
218
+ # fsdp_state_dict_type: SHARDED_STATE_DICT
219
+ # fsdp_transformer_layer_cls_to_wrap: GemmaDecoderLayer
220
+ # fsdp_sync_module_states: true
221
+ # fsdp_use_orig_params: true
222
+
223
+
224
+ # fsdp_config:
225
+ # fsdp_limit_all_gathers: true
226
+ # fsdp_sync_module_states: true
227
+ # fsdp_offload_params: true
228
+ # fsdp_use_orig_params: false
229
+ # fsdp_cpu_ram_efficient_loading: true
230
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
231
+ # fsdp_transformer_layer_cls_to_wrap: GemmaDecoderLayer
232
+ # fsdp_state_dict_type: FULL_STATE_DICT
233
+ # fsdp_sharding_strategy: FULL_SHARD
234
+ # fsdp_activation_checkpointing: true
235
+ # special_tokens:
236
+ # # pad_token: "<|end_of_text|>"
237
+ # special_tokens:
238
+ # bos_token: "<|begin_of_text|>"
239
+ # eos_token: "<|eot_id|>"
240
+ # pad_token: "<|finetune_right_pad_id|>"
241
+ ```
242
+
243
+ </details><br>
244
+
245
+ # gemma-2-27b-it-dolly-15k
246
+
247
+ This model is a fine-tuned version of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) on the databricks/databricks-dolly-15k dataset.
248
+ It achieves the following results on the evaluation set:
249
+ - Loss: 1.6809
250
+
251
+ ## Model description
252
+
253
+ More information needed
254
+
255
+ ## Intended uses & limitations
256
+
257
+ More information needed
258
+
259
+ ## Training and evaluation data
260
+
261
+ More information needed
262
+
263
+ ## Training procedure
264
+
265
+ ### Training hyperparameters
266
+
267
+ The following hyperparameters were used during training:
268
+ - learning_rate: 2e-05
269
+ - train_batch_size: 2
270
+ - eval_batch_size: 2
271
+ - seed: 42
272
+ - distributed_type: multi-GPU
273
+ - num_devices: 2
274
+ - gradient_accumulation_steps: 4
275
+ - total_train_batch_size: 16
276
+ - total_eval_batch_size: 4
277
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
278
+ - lr_scheduler_type: cosine
279
+ - lr_scheduler_warmup_steps: 23
280
+ - num_epochs: 3
281
+
282
+ ### Training results
283
+
284
+ | Training Loss | Epoch | Step | Validation Loss |
285
+ |:-------------:|:------:|:----:|:---------------:|
286
+ | 3.8741 | 0.0129 | 1 | 4.1287 |
287
+ | 3.5275 | 0.2589 | 20 | 3.7627 |
288
+ | 2.5496 | 0.5178 | 40 | 2.5361 |
289
+ | 2.1047 | 0.7767 | 60 | 2.0215 |
290
+ | 1.8435 | 1.0259 | 80 | 1.8475 |
291
+ | 1.8821 | 1.2848 | 100 | 1.7748 |
292
+ | 1.834 | 1.5437 | 120 | 1.7345 |
293
+ | 1.7633 | 1.8026 | 140 | 1.7098 |
294
+ | 1.6382 | 2.0647 | 160 | 1.6954 |
295
+ | 1.9356 | 2.3236 | 180 | 1.6863 |
296
+ | 1.6196 | 2.5825 | 200 | 1.6819 |
297
+ | 1.7489 | 2.8414 | 220 | 1.6809 |
298
+
299
+
300
+ ### Framework versions
301
+
302
+ - PEFT 0.14.0
303
+ - Transformers 4.47.1
304
+ - Pytorch 2.3.1+cu121
305
+ - Datasets 3.1.0
306
+ - Tokenizers 0.21.0
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89ea0874183b876bfbac6d55eff0dbfeaf282abd3be07d67d0ef8029990dc192
3
+ size 456822394