Update README.md
Browse files
README.md
CHANGED
@@ -55,7 +55,7 @@ Here is the table summarizing the architecture used for training, along with the
|
|
55 |
| label smoothing | 0.05 |
|
56 |
| optimize | AdamW |
|
57 |
| betas | 0.9, 0.999 |
|
58 |
-
| learning rate |
|
59 |
| anneal strategy | cos |
|
60 |
| div factor | 100 |
|
61 |
| final div factor | 0.1 |
|
|
|
55 |
| label smoothing | 0.05 |
|
56 |
| optimize | AdamW |
|
57 |
| betas | 0.9, 0.999 |
|
58 |
+
| learning rate | 1e-5 |
|
59 |
| anneal strategy | cos |
|
60 |
| div factor | 100 |
|
61 |
| final div factor | 0.1 |
|