AuriAetherwiing commited on
Commit
e64f352
·
verified ·
1 Parent(s): 69495a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -63
README.md CHANGED
@@ -5,7 +5,54 @@ tags:
5
  model-index:
6
  - name: EVA-Qwen2.5-1.5B-FFT-v0.0
7
  results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  Model card in progress making public for benchmarks.
11
 
@@ -132,66 +179,4 @@ weight_decay: 0.15
132
 
133
  ```
134
 
135
- </details><br>
136
-
137
- # EVA-Qwen2.5-1.5B-FFT-v0.0
138
-
139
- This model was trained from scratch on the None dataset.
140
- It achieves the following results on the evaluation set:
141
- - Loss: 1.3685
142
-
143
- ## Model description
144
-
145
- More information needed
146
-
147
- ## Intended uses & limitations
148
-
149
- More information needed
150
-
151
- ## Training and evaluation data
152
-
153
- More information needed
154
-
155
- ## Training procedure
156
-
157
- ### Training hyperparameters
158
-
159
- The following hyperparameters were used during training:
160
- - learning_rate: 5e-06
161
- - train_batch_size: 1
162
- - eval_batch_size: 1
163
- - seed: 42
164
- - distributed_type: multi-GPU
165
- - num_devices: 4
166
- - gradient_accumulation_steps: 8
167
- - total_train_batch_size: 32
168
- - total_eval_batch_size: 4
169
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
170
- - lr_scheduler_type: cosine
171
- - lr_scheduler_warmup_steps: 20
172
- - num_epochs: 3
173
-
174
- ### Training results
175
-
176
- | Training Loss | Epoch | Step | Validation Loss |
177
- |:-------------:|:------:|:----:|:---------------:|
178
- | 1.8166 | 0.0028 | 1 | 1.6772 |
179
- | 1.7031 | 0.2519 | 89 | 1.4633 |
180
- | 1.5925 | 0.5037 | 178 | 1.4171 |
181
- | 1.512 | 0.7556 | 267 | 1.3993 |
182
- | 1.5122 | 1.0050 | 356 | 1.3888 |
183
- | 1.5281 | 1.2574 | 445 | 1.3825 |
184
- | 1.4895 | 1.5099 | 534 | 1.3775 |
185
- | 1.4599 | 1.7624 | 623 | 1.3731 |
186
- | 1.4754 | 2.0103 | 712 | 1.3705 |
187
- | 1.4841 | 2.2619 | 801 | 1.3696 |
188
- | 1.4861 | 2.5136 | 890 | 1.3689 |
189
- | 1.5258 | 2.7653 | 979 | 1.3685 |
190
-
191
-
192
- ### Framework versions
193
-
194
- - Transformers 4.45.2
195
- - Pytorch 2.5.1+cu124
196
- - Datasets 2.21.0
197
- - Tokenizers 0.20.3
 
5
  model-index:
6
  - name: EVA-Qwen2.5-1.5B-FFT-v0.0
7
  results: []
8
+ license: apache-2.0
9
+ language:
10
+ - en
11
+ base_model:
12
+ - Qwen/Qwen2.5-1.5B
13
+ datasets:
14
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
15
+ - Nopm/Opus_WritingStruct
16
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
17
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
18
+ - Gryphe/ChatGPT-4o-Writing-Prompts
19
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
20
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
21
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
22
+ - allura-org/Celeste-1.x-data-mixture
23
+ - cognitivecomputations/dolphin-2.9.3
24
  ---
25
+ # EVA Qwen2.5-1.5BB v0.0
26
+
27
+ <p>
28
+ A small-scale RP/storywriting specialist model, full-parameter finetune of Qwen2.5-1.5B on mixture of synthetic and natural data.<br>
29
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
30
+ Unlike EVA-D 1.5B v0.0, this model was created without using DistillKit, and unlike other versions of EVA, Spectrum wasn't used either, since layer freezing is inefficient at small scale.
31
+ </p>
32
+
33
+ <p>
34
+ <br>
35
+ <h3>
36
+ Training data:
37
+ </h3>
38
+ <ul>
39
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
40
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
41
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
42
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
43
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
44
+ <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
45
+ </ul>
46
+ <h3>
47
+ Training time and hardware:
48
+ </h3>
49
+ <ul><li>TBA hours on 4x3090Ti</a></li></ul><br>
50
+ </p>
51
+ <p>Model was created by Kearm, Auri and Cahvay.</p>
52
+ <h4>Special thanks:</h4><ul>
53
+ <li><b>to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.</b></li>
54
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data</li>
55
+ <li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>
56
 
57
  Model card in progress making public for benchmarks.
58
 
 
179
 
180
  ```
181
 
182
+ </details><br>