natolambert commited on
Commit
ee4a050
·
verified ·
1 Parent(s): 6fb28c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -158,7 +158,7 @@ See the Falcon 180B model card for an example of this.
158
  ## Hyperparamters
159
 
160
  GRPO settings for RLVR:
161
- - **Learning Rate**: 3 × 10⁻⁷
162
  - **Discount Factor (gamma)**: 1.0
163
  - **General Advantage Estimation (lambda)**: 0.95
164
  - **Mini-batches (N_mb)**: 1
@@ -166,17 +166,18 @@ GRPO settings for RLVR:
166
  - **PPO's Clipping Coefficient (epsilon)**: 0.2
167
  - **Value Function Coefficient (c1)**: 0.1
168
  - **Gradient Norm Threshold**: 1.0
169
- - **Learning Rate Schedule**: Linear
170
  - **Generation Temperature**: 1.0
171
- - **Batch Size (effective)**: 224
172
  - **Max Token Length**: 2,048
173
  - **Max Prompt Token Length**: 2,048
174
- - **Penalty Reward Value for Responses without an EOS Token**: -10.0
175
  - **Response Length**: 2,048
176
- - **Total Episodes**: 100,000
177
- - **KL penalty coefficient (beta)**: 0.05
178
  - **Warm up ratio (omega)**: 0.0
179
 
 
180
  ## License and use
181
 
182
  All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).
 
158
  ## Hyperparamters
159
 
160
  GRPO settings for RLVR:
161
+ - **Learning Rate**: 5 × 10⁻⁷
162
  - **Discount Factor (gamma)**: 1.0
163
  - **General Advantage Estimation (lambda)**: 0.95
164
  - **Mini-batches (N_mb)**: 1
 
166
  - **PPO's Clipping Coefficient (epsilon)**: 0.2
167
  - **Value Function Coefficient (c1)**: 0.1
168
  - **Gradient Norm Threshold**: 1.0
169
+ - **Learning Rate Schedule**: Constant
170
  - **Generation Temperature**: 1.0
171
+ - **Batch Size (effective)**: 2
172
  - **Max Token Length**: 2,048
173
  - **Max Prompt Token Length**: 2,048
174
+ - **Penalty Reward Value for Responses without an EOS Token**: 0.0
175
  - **Response Length**: 2,048
176
+ - **Total Episodes**: 10,000,000
177
+ - **KL penalty coefficient (beta)**: 0.01
178
  - **Warm up ratio (omega)**: 0.0
179
 
180
+
181
  ## License and use
182
 
183
  All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).