alicegoesdown commited on
Commit
32a6ef3
·
verified ·
1 Parent(s): f8543b6

Adding Readme

Browse files
Files changed (1) hide show
  1. README.md +112 -0
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ library_name: peft
4
+ license: other
5
+ base_model: huggyllama/llama-7b
6
+ tags:
7
+ - axolotl
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: c4b201cf-0eeb-4380-a91f-cd6329614a81
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
+ <details><summary>See axolotl config</summary>
19
+
20
+ axolotl version: `0.4.1`
21
+ ```yaml
22
+ adapter: lora
23
+ bf16: auto
24
+ chat_template: llama3
25
+ dataset_prepared_path: null
26
+ debug: null
27
+ deepspeed: null
28
+ early_stopping_patience: null
29
+ eval_max_new_tokens: 128
30
+ eval_table_size: null
31
+ evals_per_epoch: 1
32
+ flash_attention: true
33
+ fp16: null
34
+ fsdp: null
35
+ fsdp_config: null
36
+ gradient_accumulation_steps: 8
37
+ gradient_checkpointing: true
38
+ gradient_clipping: 0.1
39
+ group_by_length: false
40
+ hub_repo: null
41
+ hub_strategy: end
42
+ hub_token: null
43
+ learning_rate: 5.0e-07
44
+ load_in_4bit: true
45
+ load_in_8bit: true
46
+ local_rank: null
47
+ logging_steps: 1
48
+ lora_alpha: 16
49
+ lora_dropout: 0.05
50
+ lora_fan_in_fan_out: null
51
+ lora_model_dir: null
52
+ lora_r: 8
53
+ lora_target_linear: true
54
+ lr_scheduler: linear
55
+ max_steps: 200
56
+ micro_batch_size: 128
57
+ mlflow_experiment_name: /tmp/aed51b8e2c089967_train_data.json
58
+ model_type: AutoModelForCausalLM
59
+ num_epochs: 1
60
+ optimizer: adamw_bnb_8bit
61
+ output_dir: miner_id_24
62
+ pad_to_sequence_len: true
63
+ resume_from_checkpoint: null
64
+ s2_attention: null
65
+ sample_packing: false
66
+ saves_per_epoch: 1
67
+ sequence_len: 4096
68
+ special_tokens:
69
+ pad_token: </PAD>
70
+ strict: false
71
+ tf32: false
72
+ tokenizer_type: AutoTokenizer
73
+ train_on_inputs: false
74
+ trust_remote_code: true
75
+ val_set_size: 0.25
76
+ wandb_entity: null
77
+ wandb_mode: online
78
+ wandb_name: 6a8f76dd-7262-490a-905c-7b83c0f56891
79
+ wandb_project: Gradients-On-Demand
80
+ wandb_run: your_name
81
+ wandb_runid: 6a8f76dd-7262-490a-905c-7b83c0f56891
82
+ warmup_steps: 5
83
+ weight_decay: 0.1
84
+ xformers_attention: true
85
+
86
+ ```
87
+
88
+ </details><br>
89
+
90
+
91
+ ### Training hyperparameters
92
+
93
+ The following hyperparameters were used during training:
94
+ - learning_rate: 5e-07
95
+ - train_batch_size: 128
96
+ - eval_batch_size: 2
97
+ - seed: 42
98
+ - gradient_accumulation_steps: 4
99
+ - total_train_batch_size: 8
100
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
101
+ - lr_scheduler_type: cosine
102
+ - lr_scheduler_warmup_steps: 5
103
+ - training_steps: 200
104
+
105
+
106
+ ### Framework versions
107
+
108
+ - PEFT 0.13.2
109
+ - Transformers 4.46.0
110
+ - Pytorch 2.5.0+cu124
111
+ - Datasets 3.0.1
112
+ - Tokenizers 0.20.1