Kquant03 commited on
Commit
86a308d
1 Parent(s): 60a51df

End of training

Browse files
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: llama3.1
4
+ base_model: meta-llama/Llama-3.1-8B-Instruct
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: L3.1-Pneuma-8B
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.5.0`
20
+ ```yaml
21
+ base_model: meta-llama/Llama-3.1-8B-Instruct
22
+
23
+ load_in_8bit: false
24
+ load_in_4bit: false
25
+ strict: false
26
+
27
+ load_in_8bit: false
28
+ load_in_4bit: false
29
+ strict: false
30
+
31
+ datasets:
32
+ - path: Sandevistan_cleaned.jsonl
33
+ type: customllama3_stan
34
+ dataset_prepared_path: last_run_prepared
35
+ val_set_size: 0.05
36
+ output_dir: ./outputs/out
37
+
38
+ fix_untrained_tokens: true
39
+
40
+ sequence_len: 4096
41
+ sample_packing: true
42
+ pad_to_sequence_len: true
43
+
44
+ wandb_project: Pneuma
45
+ wandb_entity:
46
+ wandb_watch:
47
+ wandb_name:
48
+ wandb_log_model:
49
+
50
+ gradient_accumulation_steps: 16
51
+ micro_batch_size: 8
52
+ num_epochs: 2
53
+ optimizer: paged_adamw_8bit
54
+ lr_scheduler: cosine
55
+ learning_rate: 0.0000078
56
+ max_grad_norm: 1
57
+
58
+ train_on_inputs: false
59
+ group_by_length: false
60
+ bf16: auto
61
+ fp16:
62
+ tf32: false
63
+
64
+ gradient_checkpointing: unsloth
65
+ early_stopping_patience:
66
+ resume_from_checkpoint:
67
+ logging_steps: 1
68
+ xformers_attention:
69
+ flash_attention: true
70
+ eval_sample_packing: false
71
+
72
+ plugins:
73
+ - axolotl.integrations.liger.LigerPlugin
74
+ liger_rope: true
75
+ liger_rms_norm: true
76
+ liger_swiglu: true
77
+ liger_fused_linear_cross_entropy: true
78
+
79
+ hub_model_id: Replete-AI/L3.1-Pneuma-8B
80
+ hub_strategy: every_save
81
+
82
+ warmup_steps: 0
83
+ evals_per_epoch: 3
84
+ eval_table_size:
85
+ saves_per_epoch: 3
86
+ debug:
87
+ deepspeed:
88
+ weight_decay: 0.1
89
+ fsdp:
90
+ fsdp_config:
91
+ special_tokens:
92
+ bos_token: "<|begin_of_text|>"
93
+ eos_token: "<|end_of_text|>"
94
+ pad_token: "<|end_of_text|>"
95
+ tokens:
96
+
97
+ ```
98
+
99
+ </details><br>
100
+
101
+ # L3.1-Pneuma-8B
102
+
103
+ This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the None dataset.
104
+ It achieves the following results on the evaluation set:
105
+ - Loss: 2.4357
106
+
107
+ ## Model description
108
+
109
+ More information needed
110
+
111
+ ## Intended uses & limitations
112
+
113
+ More information needed
114
+
115
+ ## Training and evaluation data
116
+
117
+ More information needed
118
+
119
+ ## Training procedure
120
+
121
+ ### Training hyperparameters
122
+
123
+ The following hyperparameters were used during training:
124
+ - learning_rate: 7.8e-06
125
+ - train_batch_size: 8
126
+ - eval_batch_size: 8
127
+ - seed: 42
128
+ - gradient_accumulation_steps: 16
129
+ - total_train_batch_size: 128
130
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
+ - lr_scheduler_type: cosine
132
+ - num_epochs: 2
133
+
134
+ ### Training results
135
+
136
+ | Training Loss | Epoch | Step | Validation Loss |
137
+ |:-------------:|:------:|:----:|:---------------:|
138
+ | 1.0731 | 0.0023 | 1 | 2.7679 |
139
+ | 0.6458 | 0.3338 | 143 | 2.4576 |
140
+ | 0.6504 | 0.6675 | 286 | 2.4407 |
141
+ | 1.112 | 1.0019 | 429 | 2.4358 |
142
+ | 0.6014 | 1.3357 | 572 | 2.4358 |
143
+ | 0.6194 | 1.6694 | 715 | 2.4357 |
144
+
145
+
146
+ ### Framework versions
147
+
148
+ - Transformers 4.46.1
149
+ - Pytorch 2.3.1+cu121
150
+ - Datasets 3.0.1
151
+ - Tokenizers 0.20.3
pytorch_model-00001-of-00004.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4cf1484a5ce2b4a67212e937297f10ce722f2fcbea3399033b77d8fd48c95049
3
  size 4976718466
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c193c5a499a783e120720a2f9381a9f423b4ad57725fe059c103c499475f9513
3
  size 4976718466
pytorch_model-00002-of-00004.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:107d76eca0016426a9da33b0c4afe3a44ceb0d656bb2e4b78ef6e06e38e1c24c
3
  size 4999827718
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7884763202517d8e76b0f1a6fb0347575cda975361c88c6beb78df7db5821b28
3
  size 4999827718
pytorch_model-00003-of-00004.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e847bf7dfd1c8f35a334223f8479581b2acf32712281c693f2d96fad13c710e6
3
  size 4915940170
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7c0615670eb29643656d5077816df96be5ecb57e85b0cb6ddd13e8bd3820e7f
3
  size 4915940170
pytorch_model-00004-of-00004.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f6fd784ac6435dd67ba148408fdadd606467181f964363368f8fa28f3a9aa1b7
3
  size 1168140873
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb4737a666f20d6bd08121a387491858e5824dc0f79948d1a6a0904e971edaec
3
  size 1168140873