TsinghuaAI
/

CPM-Generate

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Canwen Xu commited on Dec 11, 2020

Commit

ce1c7c6

•

1 Parent(s): 05d7855

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ We collect different kinds of texts in our pre-training, including encyclopedia,
 ## Training procedure
-Based on the hyper-parameter searching on the learning rate and batch size, we set the learning rate as $1.5\times10^{-4}$ and the batch size as $3,072$, which makes the model training more stable. In the first version, we still adopt the dense attention and the max sequence length is $1,024$. We will implement sparse attention in the future. We pre-train our model for $20,000$ steps, and the first $5,000$ steps are for warm-up. The optimizer is Adam. It takes two weeks to train our largest model using $64$ NVIDIA V100.
 ## Eval results

 ## Training procedure
+Based on the hyper-parameter searching on the learning rate and batch size, we set the learning rate as \\(1.5\times10^{-4}\\) and the batch size as \\(3,072\\), which makes the model training more stable. In the first version, we still adopt the dense attention and the max sequence length is \\(1,024\\). We will implement sparse attention in the future. We pre-train our model for \\(20,000\\) steps, and the first \\(5,000\\) steps are for warm-up. The optimizer is Adam. It takes two weeks to train our largest model using \\(64\\) NVIDIA V100.
 ## Eval results