Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ Max Response Length: 2048
|
|
12 |
best_of_n: 2 (2 samples for each prompt)
|
13 |
Learning Rate: 5e-7
|
14 |
Beta: 0.1
|
15 |
-
Scheduler: Cosine with Warmup (0.03) and MinLR (0.1)
|
16 |
Rollout Batch Size: 20000
|
17 |
Training Batch Size: 256
|
18 |
Number of Iterations: 9
|
|
|
12 |
best_of_n: 2 (2 samples for each prompt)
|
13 |
Learning Rate: 5e-7
|
14 |
Beta: 0.1
|
15 |
+
Scheduler: Cosine with Warmup (0.03) and MinLR (0.1 * init_lr)
|
16 |
Rollout Batch Size: 20000
|
17 |
Training Batch Size: 256
|
18 |
Number of Iterations: 9
|