Update README.md
Browse files
README.md
CHANGED
@@ -158,7 +158,7 @@ See the Falcon 180B model card for an example of this.
|
|
158 |
## Hyperparamters
|
159 |
|
160 |
GRPO settings for RLVR:
|
161 |
-
- **Learning Rate**:
|
162 |
- **Discount Factor (gamma)**: 1.0
|
163 |
- **General Advantage Estimation (lambda)**: 0.95
|
164 |
- **Mini-batches (N_mb)**: 1
|
@@ -166,17 +166,18 @@ GRPO settings for RLVR:
|
|
166 |
- **PPO's Clipping Coefficient (epsilon)**: 0.2
|
167 |
- **Value Function Coefficient (c1)**: 0.1
|
168 |
- **Gradient Norm Threshold**: 1.0
|
169 |
-
- **Learning Rate Schedule**:
|
170 |
- **Generation Temperature**: 1.0
|
171 |
-
- **Batch Size (effective)**:
|
172 |
- **Max Token Length**: 2,048
|
173 |
- **Max Prompt Token Length**: 2,048
|
174 |
-
- **Penalty Reward Value for Responses without an EOS Token**:
|
175 |
- **Response Length**: 2,048
|
176 |
-
- **Total Episodes**:
|
177 |
-
- **KL penalty coefficient (beta)**: 0.
|
178 |
- **Warm up ratio (omega)**: 0.0
|
179 |
|
|
|
180 |
## License and use
|
181 |
|
182 |
All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).
|
|
|
158 |
## Hyperparamters
|
159 |
|
160 |
GRPO settings for RLVR:
|
161 |
+
- **Learning Rate**: 5 × 10⁻⁷
|
162 |
- **Discount Factor (gamma)**: 1.0
|
163 |
- **General Advantage Estimation (lambda)**: 0.95
|
164 |
- **Mini-batches (N_mb)**: 1
|
|
|
166 |
- **PPO's Clipping Coefficient (epsilon)**: 0.2
|
167 |
- **Value Function Coefficient (c1)**: 0.1
|
168 |
- **Gradient Norm Threshold**: 1.0
|
169 |
+
- **Learning Rate Schedule**: Constant
|
170 |
- **Generation Temperature**: 1.0
|
171 |
+
- **Batch Size (effective)**: 2
|
172 |
- **Max Token Length**: 2,048
|
173 |
- **Max Prompt Token Length**: 2,048
|
174 |
+
- **Penalty Reward Value for Responses without an EOS Token**: 0.0
|
175 |
- **Response Length**: 2,048
|
176 |
+
- **Total Episodes**: 10,000,000
|
177 |
+
- **KL penalty coefficient (beta)**: 0.01
|
178 |
- **Warm up ratio (omega)**: 0.0
|
179 |
|
180 |
+
|
181 |
## License and use
|
182 |
|
183 |
All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).
|