collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9331
- Num Input Tokens Seen: 13190464
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
2.3244 | 0.0184 | 5 | 1.0518 | 240912 |
2.2442 | 0.0368 | 10 | 0.9933 | 480908 |
2.1347 | 0.0551 | 15 | 0.9797 | 713948 |
2.0779 | 0.0735 | 20 | 0.9788 | 953808 |
1.6988 | 0.0919 | 25 | 0.9776 | 1202776 |
1.6197 | 0.1103 | 30 | 0.9794 | 1447736 |
1.5939 | 0.1286 | 35 | 0.9787 | 1694460 |
1.391 | 0.1470 | 40 | 0.9787 | 1934204 |
1.1954 | 0.1654 | 45 | 0.9771 | 2171112 |
1.1232 | 0.1838 | 50 | 0.9747 | 2409548 |
1.1961 | 0.2022 | 55 | 0.9722 | 2648484 |
0.9664 | 0.2205 | 60 | 0.9710 | 2887652 |
1.1064 | 0.2389 | 65 | 0.9667 | 3127516 |
1.0085 | 0.2573 | 70 | 0.9611 | 3368304 |
0.8056 | 0.2757 | 75 | 0.9606 | 3603000 |
0.9106 | 0.2941 | 80 | 0.9576 | 3850976 |
0.9384 | 0.3124 | 85 | 0.9544 | 4094752 |
0.8953 | 0.3308 | 90 | 0.9521 | 4345860 |
0.8928 | 0.3492 | 95 | 0.9511 | 4588756 |
0.7887 | 0.3676 | 100 | 0.9490 | 4837704 |
0.9092 | 0.3859 | 105 | 0.9497 | 5078112 |
0.7458 | 0.4043 | 110 | 0.9471 | 5318968 |
0.762 | 0.4227 | 115 | 0.9463 | 5556324 |
0.8916 | 0.4411 | 120 | 0.9436 | 5803288 |
0.791 | 0.4595 | 125 | 0.9442 | 6042868 |
0.9366 | 0.4778 | 130 | 0.9417 | 6282932 |
0.8494 | 0.4962 | 135 | 0.9418 | 6522180 |
1.0078 | 0.5146 | 140 | 0.9399 | 6773624 |
0.9159 | 0.5330 | 145 | 0.9380 | 7011976 |
1.0115 | 0.5513 | 150 | 0.9390 | 7257008 |
0.84 | 0.5697 | 155 | 0.9380 | 7501580 |
0.8987 | 0.5881 | 160 | 0.9393 | 7742124 |
0.9589 | 0.6065 | 165 | 0.9370 | 7981768 |
0.8201 | 0.6249 | 170 | 0.9371 | 8222304 |
0.7601 | 0.6432 | 175 | 0.9348 | 8469856 |
0.7465 | 0.6616 | 180 | 0.9378 | 8710912 |
0.8689 | 0.6800 | 185 | 0.9381 | 8949132 |
0.6945 | 0.6984 | 190 | 0.9343 | 9196744 |
0.7289 | 0.7167 | 195 | 0.9358 | 9434412 |
0.583 | 0.7351 | 200 | 0.9336 | 9677156 |
0.6272 | 0.7535 | 205 | 0.9356 | 9916792 |
0.7919 | 0.7719 | 210 | 0.9353 | 10162084 |
0.9377 | 0.7903 | 215 | 0.9334 | 10403240 |
0.7397 | 0.8086 | 220 | 0.9330 | 10650280 |
0.6871 | 0.8270 | 225 | 0.9342 | 10885396 |
0.9175 | 0.8454 | 230 | 0.9339 | 11138056 |
0.621 | 0.8638 | 235 | 0.9336 | 11382612 |
0.8007 | 0.8822 | 240 | 0.9324 | 11620516 |
0.691 | 0.9005 | 245 | 0.9353 | 11865444 |
0.7516 | 0.9189 | 250 | 0.9329 | 12109276 |
0.9474 | 0.9373 | 255 | 0.9326 | 12346224 |
0.7389 | 0.9557 | 260 | 0.9335 | 12594020 |
0.7986 | 0.9740 | 265 | 0.9310 | 12844164 |
0.9011 | 0.9924 | 270 | 0.9335 | 13090264 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 18
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2
Base model
google/gemma-2-27b