OpenRLHF
/

Llama-3-8b-iter-dpo-179k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chuyi777 commited on Jul 14

Commit

9bf875b

•

1 Parent(s): 775458a

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -6,12 +6,13 @@ Datasets and Hyperparameters
 Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
 SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
 Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
 Max Prompt Length: 2048
 Max Response Length: 2048
 best_of_n: 2 (2 samples for each prompt)
 Learning Rate: 5e-7
 Beta: 0.1
-Scheduler: Cosine with Warmup and MinLR
 Rollout Batch Size: 20000
 Training Batch Size: 256
 Number of Iterations: 9

 Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
 SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
 Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
 Max Prompt Length: 2048
 Max Response Length: 2048
 best_of_n: 2 (2 samples for each prompt)
 Learning Rate: 5e-7
 Beta: 0.1
+Scheduler: Cosine with Warmup (0.03) and MinLR (0.1)
 Rollout Batch Size: 20000
 Training Batch Size: 256
 Number of Iterations: 9