Triangle104 commited on
Commit
1ec0c56
·
verified ·
1 Parent(s): ec836d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +183 -0
README.md CHANGED
@@ -28,6 +28,189 @@ model-index:
28
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-1.5B-v0.0`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-1.5B-v0.0) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
29
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-1.5B-v0.0) for more details on the model.
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ## Use with llama.cpp
32
  Install llama.cpp through brew (works on Mac and Linux)
33
 
 
28
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-1.5B-v0.0`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-1.5B-v0.0) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
29
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-1.5B-v0.0) for more details on the model.
30
 
31
+ ---
32
+ Model details:
33
+ -
34
+
35
+ A small-scale RP/storywriting specialist model, full-parameter
36
+ finetune of Qwen2.5-1.5B on mixture of synthetic and natural data.
37
+
38
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve
39
+ versatility, creativity and "flavor" of the resulting model.
40
+
41
+ Unlike EVA-D 1.5B v0.0, this model was created without using
42
+ DistillKit, and unlike other versions of EVA, Spectrum wasn't used
43
+ either, since layer freezing is inefficient at small scale.
44
+
45
+
46
+
47
+
48
+
49
+
50
+
51
+
52
+
53
+ Training data:
54
+
55
+
56
+
57
+ Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
58
+ Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
59
+ A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
60
+ A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
61
+ Synthstruct and SynthRP datasets by Epiculous
62
+ A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.
63
+
64
+
65
+
66
+ Training time and hardware:
67
+
68
+
69
+
70
+ 9 hours on 4x3090Ti
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+ Model was created by Kearm, Auri and Cahvay.
79
+
80
+
81
+ Special thanks:
82
+ to Cahvay for his work on investigating and reprocessing the
83
+ corrupted dataset, removing the single biggest source of data poisoning.
84
+ to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data
85
+ and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.
86
+
87
+
88
+
89
+
90
+
91
+ See axolotl config
92
+
93
+
94
+ axolotl version: 0.4.1
95
+
96
+
97
+ base_model: /media/kearm/Disk_2/HF_FAST_MoE_Fodder/Qwen2.5-1.5B
98
+
99
+ load_in_8bit: false
100
+ load_in_4bit: false
101
+ strict: false
102
+
103
+ plugins:
104
+ - axolotl.integrations.liger.LigerPlugin
105
+ liger_rope: true
106
+ liger_rms_norm: true
107
+ liger_swiglu: true
108
+ liger_fused_linear_cross_entropy: true
109
+
110
+ # plugins:
111
+ # - axolotl.integrations.spectrum.SpectrumPlugin
112
+
113
+ # spectrum_top_fraction: 0.5
114
+ # # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
115
+ # spectrum_model_name: Qwen/Qwen2.5-32B
116
+
117
+ datasets:
118
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
119
+ type: sharegpt
120
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
121
+ type: sharegpt
122
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
123
+ type: sharegpt
124
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
125
+ type: sharegpt
126
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
127
+ type: sharegpt
128
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
129
+ type: sharegpt
130
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
131
+ type: sharegpt
132
+ - path: datasets/S2.jsonl
133
+ type: sharegpt
134
+ - path: datasets/Turing.jsonl
135
+ type: sharegpt
136
+
137
+ chat_template: chatml
138
+ shuffle_merged_datasets: true
139
+ val_set_size: 0.05
140
+ output_dir: EVA-Qwen2.5-1.5B-FFT-v0.0
141
+
142
+ sequence_len: 10240
143
+ sample_packing: true
144
+ eval_sample_packing: false
145
+ pad_to_sequence_len: true
146
+
147
+ # adapter: qlora
148
+ # lora_model_dir:
149
+ # lora_r: 64
150
+ # lora_alpha: 128
151
+ # lora_dropout: 0.05
152
+ # lora_target_linear: true
153
+ # peft_use_dora: true
154
+
155
+ wandb_project: EVA-Qwen2.5-1.5B-FFT-v0.0
156
+ wandb_entity:
157
+ wandb_watch:
158
+ wandb_name: Unit-00
159
+ wandb_log_model:
160
+
161
+ gradient_accumulation_steps: 8
162
+ micro_batch_size: 1
163
+ num_epochs: 3
164
+ optimizer: paged_adamw_8bit
165
+ lr_scheduler: cosine
166
+ learning_rate: 0.000005
167
+ max_grad_norm: 1.5
168
+
169
+ train_on_inputs: false
170
+ group_by_length: false
171
+ bf16: auto
172
+ fp16:
173
+ tf32: false
174
+
175
+ gradient_checkpointing: "unsloth"
176
+ gradient_checkpointing_kwargs:
177
+ use_reentrant: true
178
+ early_stopping_patience:
179
+ resume_from_checkpoint:
180
+ local_rank:
181
+ logging_steps: 1
182
+ xformers_attention:
183
+ flash_attention: true
184
+
185
+ warmup_steps: 20
186
+ evals_per_epoch: 4
187
+ saves_per_epoch: 4
188
+ save_safetensors: true
189
+ save_total_limit: 8
190
+ hub_model_id:
191
+ hub_strategy:
192
+ debug:
193
+ deepspeed: deepspeed_configs/zero3_bf16.json
194
+ weight_decay: 0.15
195
+ # fsdp:
196
+ # - full_shard
197
+ # - auto_wrap
198
+ # fsdp_config:
199
+ # fsdp_limit_all_gathers: true
200
+ # fsdp_sync_module_states: false
201
+ # fsdp_offload_params: true
202
+ # fsdp_cpu_ram_efficient_loading: true
203
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
204
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
205
+ # fsdp_activation_checkpointing: true
206
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
207
+ # fsdp_sharding_strategy: FULL_SHARD
208
+ # fsdp_forward_prefetch: false # Added
209
+ # fsdp_backward_prefetch: "BACKWARD_PRE" # Added
210
+ # fsdp_backward_prefetch_limit: 1 # Added
211
+ # fsdp_mixed_precision: BF16 # Added
212
+
213
+ ---
214
  ## Use with llama.cpp
215
  Install llama.cpp through brew (works on Mac and Linux)
216