metadata
library_name: transformers
tags:
- generated_from_trainer
datasets:
- kanishka/babylm2-rewritten-clean-spacy
metrics:
- accuracy
model-index:
- name: opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-1024_1e-3
results:
- task:
name: Causal Language Modeling
type: text-generation
dataset:
name: kanishka/babylm2-rewritten-clean-spacy
type: kanishka/babylm2-rewritten-clean-spacy
metrics:
- name: Accuracy
type: accuracy
value: 0.4786608631587111
opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-1024_1e-3
This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset. It achieves the following results on the evaluation set:
- Loss: 2.6845
- Accuracy: 0.4787
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 1024
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
4.093 | 1.0 | 2256 | 3.8133 | 0.3607 |
3.4493 | 2.0 | 4512 | 3.3059 | 0.4091 |
3.132 | 3.0 | 6768 | 3.0989 | 0.4295 |
2.9232 | 4.0 | 9024 | 2.9917 | 0.4401 |
2.8445 | 5.0 | 11280 | 2.9270 | 0.4466 |
2.7872 | 6.0 | 13536 | 2.8861 | 0.4510 |
2.7438 | 7.0 | 15792 | 2.8637 | 0.4541 |
2.7139 | 8.0 | 18048 | 2.8414 | 0.4565 |
2.6887 | 9.0 | 20304 | 2.8274 | 0.4580 |
2.6686 | 10.0 | 22560 | 2.8194 | 0.4590 |
2.65 | 11.0 | 24816 | 2.8066 | 0.4605 |
2.6481 | 12.0 | 27072 | 2.8002 | 0.4611 |
2.6382 | 13.0 | 29328 | 2.7956 | 0.4618 |
2.6285 | 14.0 | 31584 | 2.7968 | 0.4619 |
2.6116 | 15.0 | 33840 | 2.7692 | 0.4650 |
2.5677 | 16.0 | 36096 | 2.7446 | 0.4684 |
2.5197 | 17.0 | 38352 | 2.7191 | 0.4717 |
2.462 | 18.0 | 40608 | 2.7010 | 0.4747 |
2.3973 | 19.0 | 42864 | 2.6860 | 0.4773 |
2.324 | 19.9915 | 45100 | 2.6845 | 0.4787 |
Framework versions
- Transformers 4.48.0
- Pytorch 2.5.1
- Datasets 3.2.0
- Tokenizers 0.21.0