--- library_name: transformers tags: - generated_from_trainer datasets: - kanishka/babylm2-rewritten-clean-spacy metrics: - accuracy model-index: - name: opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-1024_1e-3 results: - task: name: Causal Language Modeling type: text-generation dataset: name: kanishka/babylm2-rewritten-clean-spacy type: kanishka/babylm2-rewritten-clean-spacy metrics: - name: Accuracy type: accuracy value: 0.4786608631587111 --- # opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-1024_1e-3 This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset. It achieves the following results on the evaluation set: - Loss: 2.6845 - Accuracy: 0.4787 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.001 - train_batch_size: 32 - eval_batch_size: 64 - seed: 1024 - gradient_accumulation_steps: 8 - total_train_batch_size: 256 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 32000 - num_epochs: 20.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-------:|:-----:|:---------------:|:--------:| | 4.093 | 1.0 | 2256 | 3.8133 | 0.3607 | | 3.4493 | 2.0 | 4512 | 3.3059 | 0.4091 | | 3.132 | 3.0 | 6768 | 3.0989 | 0.4295 | | 2.9232 | 4.0 | 9024 | 2.9917 | 0.4401 | | 2.8445 | 5.0 | 11280 | 2.9270 | 0.4466 | | 2.7872 | 6.0 | 13536 | 2.8861 | 0.4510 | | 2.7438 | 7.0 | 15792 | 2.8637 | 0.4541 | | 2.7139 | 8.0 | 18048 | 2.8414 | 0.4565 | | 2.6887 | 9.0 | 20304 | 2.8274 | 0.4580 | | 2.6686 | 10.0 | 22560 | 2.8194 | 0.4590 | | 2.65 | 11.0 | 24816 | 2.8066 | 0.4605 | | 2.6481 | 12.0 | 27072 | 2.8002 | 0.4611 | | 2.6382 | 13.0 | 29328 | 2.7956 | 0.4618 | | 2.6285 | 14.0 | 31584 | 2.7968 | 0.4619 | | 2.6116 | 15.0 | 33840 | 2.7692 | 0.4650 | | 2.5677 | 16.0 | 36096 | 2.7446 | 0.4684 | | 2.5197 | 17.0 | 38352 | 2.7191 | 0.4717 | | 2.462 | 18.0 | 40608 | 2.7010 | 0.4747 | | 2.3973 | 19.0 | 42864 | 2.6860 | 0.4773 | | 2.324 | 19.9915 | 45100 | 2.6845 | 0.4787 | ### Framework versions - Transformers 4.48.0 - Pytorch 2.5.1 - Datasets 3.2.0 - Tokenizers 0.21.0