DRXD1000 commited on
Commit
ec1535d
1 Parent(s): dcde308

Added SFT Hypterparameters

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -135,6 +135,22 @@ Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-
135
  ### Training hyperparameters
136
 
137
  The following hyperparameters were used during training:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
  - learning_rate: 5e-07
139
  - train_batch_size: 8
140
  - eval_batch_size: 4
 
135
  ### Training hyperparameters
136
 
137
  The following hyperparameters were used during training:
138
+
139
+ #### SFT Training
140
+ - learning_rate: 2e-05
141
+ - train_batch_size: 32
142
+ - eval_batch_size: 16
143
+ - seed: 42
144
+ - distributed_type: multi-GPU
145
+ - num_devices: 8
146
+ - gradient_accumulation_steps: 2
147
+ - total_train_batch_size: 512
148
+ - total_eval_batch_size: 128
149
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
150
+ - lr_scheduler_type: cosine
151
+ - num_epochs: 1
152
+
153
+ #### DPO Training
154
  - learning_rate: 5e-07
155
  - train_batch_size: 8
156
  - eval_batch_size: 4