Update README.md
Browse files
README.md
CHANGED
@@ -373,7 +373,7 @@ https://ibm.github.io/model-recycling/
|
|
373 |
|
374 |
### Software and training details
|
375 |
|
376 |
-
The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took
|
377 |
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
378 |
|
379 |
|
|
|
373 |
|
374 |
### Software and training details
|
375 |
|
376 |
+
The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 15 days on Nvidia A30 24GB gpu.
|
377 |
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
378 |
|
379 |
|