Update README.md
Browse files
README.md
CHANGED
@@ -6,12 +6,13 @@ Datasets and Hyperparameters
|
|
6 |
Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
|
7 |
SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
|
8 |
Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
|
|
|
9 |
Max Prompt Length: 2048
|
10 |
Max Response Length: 2048
|
11 |
best_of_n: 2 (2 samples for each prompt)
|
12 |
Learning Rate: 5e-7
|
13 |
Beta: 0.1
|
14 |
-
Scheduler: Cosine with Warmup and MinLR
|
15 |
Rollout Batch Size: 20000
|
16 |
Training Batch Size: 256
|
17 |
Number of Iterations: 9
|
|
|
6 |
Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
|
7 |
SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
|
8 |
Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
|
9 |
+
|
10 |
Max Prompt Length: 2048
|
11 |
Max Response Length: 2048
|
12 |
best_of_n: 2 (2 samples for each prompt)
|
13 |
Learning Rate: 5e-7
|
14 |
Beta: 0.1
|
15 |
+
Scheduler: Cosine with Warmup (0.03) and MinLR (0.1)
|
16 |
Rollout Batch Size: 20000
|
17 |
Training Batch Size: 256
|
18 |
Number of Iterations: 9
|