natolambert commited on
Commit
9726ae9
·
verified ·
1 Parent(s): ee4a050

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -160,11 +160,9 @@ See the Falcon 180B model card for an example of this.
160
  GRPO settings for RLVR:
161
  - **Learning Rate**: 5 × 10⁻⁷
162
  - **Discount Factor (gamma)**: 1.0
163
- - **General Advantage Estimation (lambda)**: 0.95
164
  - **Mini-batches (N_mb)**: 1
165
- - **PPO Update Iterations (K)**: 4
166
- - **PPO's Clipping Coefficient (epsilon)**: 0.2
167
- - **Value Function Coefficient (c1)**: 0.1
168
  - **Gradient Norm Threshold**: 1.0
169
  - **Learning Rate Schedule**: Constant
170
  - **Generation Temperature**: 1.0
 
160
  GRPO settings for RLVR:
161
  - **Learning Rate**: 5 × 10⁻⁷
162
  - **Discount Factor (gamma)**: 1.0
 
163
  - **Mini-batches (N_mb)**: 1
164
+ - **Update Iterations (K)**: 4
165
+ - **Clipping Coefficient (epsilon)**: 0.2
 
166
  - **Gradient Norm Threshold**: 1.0
167
  - **Learning Rate Schedule**: Constant
168
  - **Generation Temperature**: 1.0