Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,26 @@ More information needed
|
|
28 |
More information needed
|
29 |
|
30 |
## Training procedure
|
31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
### Training hyperparameters
|
33 |
|
34 |
The following hyperparameters were used during training:
|
|
|
28 |
More information needed
|
29 |
|
30 |
## Training procedure
|
31 |
+
The [`run_clm.py` script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py) from the transformers library was used. Training was distributed on two NVIDIA Quadro RTX 6000 GPUs:
|
32 |
+
```bash
|
33 |
+
TORCH_CPP_LOG_LEVEL=INFO NCCL_DEBUG=INFO CUDA_VISIBLE_DEVICES=0,1 nohup python -m torch.distributed.launch \
|
34 |
+
--nproc_per_node=2 run_clm.py --output_dir="./training_nen" \
|
35 |
+
--model_type="gpt2" \
|
36 |
+
--config_name="./training" \
|
37 |
+
--tokenizer_name="./training" \
|
38 |
+
--dataset_name="RaiBP/openwebtext2-first-30-chunks-english-only-examples" \
|
39 |
+
--do_train \
|
40 |
+
--per_device_train_batch_size 8 \
|
41 |
+
--block_size="1024" \
|
42 |
+
--learning_rate="5e-3" --warmup_steps="1000" \
|
43 |
+
--adam_beta1="0.9" --adam_beta2="0.98" --weight_decay="0.01" \
|
44 |
+
--overwrite_output_dir \
|
45 |
+
--num_train_epochs="1" \
|
46 |
+
--logging_steps="500" \
|
47 |
+
--save_steps="5000" --preprocessing_num_workers="16" \
|
48 |
+
--gradient_accumulation_steps="4" --report_to="tensorboard" \
|
49 |
+
--logging_dir="./log_nen" > command_nen_log.log 2>&1 &
|
50 |
+
```
|
51 |
### Training hyperparameters
|
52 |
|
53 |
The following hyperparameters were used during training:
|