allenai
/

Llama-3.1-Tulu-3.1-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

natolambert commited on 19 days ago

Commit

ee4a050

·

verified ·

1 Parent(s): 6fb28c3

Update README.md

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -158,7 +158,7 @@ See the Falcon 180B model card for an example of this.
 ## Hyperparamters
 GRPO settings for RLVR:
-- **Learning Rate**: 3 × 10⁻⁷
 - **Discount Factor (gamma)**: 1.0
 - **General Advantage Estimation (lambda)**: 0.95
 - **Mini-batches (N_mb)**: 1
@@ -166,17 +166,18 @@ GRPO settings for RLVR:
 - **PPO's Clipping Coefficient (epsilon)**: 0.2
 - **Value Function Coefficient (c1)**: 0.1
 - **Gradient Norm Threshold**: 1.0
-- **Learning Rate Schedule**: Linear
 - **Generation Temperature**: 1.0
-- **Batch Size (effective)**: 224
 - **Max Token Length**: 2,048
 - **Max Prompt Token Length**: 2,048
-- **Penalty Reward Value for Responses without an EOS Token**: -10.0
 - **Response Length**: 2,048
-- **Total Episodes**: 100,000
-- **KL penalty coefficient (beta)**: 0.05
 - **Warm up ratio (omega)**: 0.0
 ## License and use
 All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).

 ## Hyperparamters
 GRPO settings for RLVR:
+- **Learning Rate**: 5 × 10⁻⁷
 - **Discount Factor (gamma)**: 1.0
 - **General Advantage Estimation (lambda)**: 0.95
 - **Mini-batches (N_mb)**: 1
 - **PPO's Clipping Coefficient (epsilon)**: 0.2
 - **Value Function Coefficient (c1)**: 0.1
 - **Gradient Norm Threshold**: 1.0
+- **Learning Rate Schedule**: Constant
 - **Generation Temperature**: 1.0
+- **Batch Size (effective)**: 2
 - **Max Token Length**: 2,048
 - **Max Prompt Token Length**: 2,048
+- **Penalty Reward Value for Responses without an EOS Token**: 0.0
 - **Response Length**: 2,048
+- **Total Episodes**: 10,000,000
+- **KL penalty coefficient (beta)**: 0.01
 - **Warm up ratio (omega)**: 0.0
 ## License and use
 All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).