collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9331
  • Num Input Tokens Seen: 13190464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.3244 0.0184 5 1.0518 240912
2.2442 0.0368 10 0.9933 480908
2.1347 0.0551 15 0.9797 713948
2.0779 0.0735 20 0.9788 953808
1.6988 0.0919 25 0.9776 1202776
1.6197 0.1103 30 0.9794 1447736
1.5939 0.1286 35 0.9787 1694460
1.391 0.1470 40 0.9787 1934204
1.1954 0.1654 45 0.9771 2171112
1.1232 0.1838 50 0.9747 2409548
1.1961 0.2022 55 0.9722 2648484
0.9664 0.2205 60 0.9710 2887652
1.1064 0.2389 65 0.9667 3127516
1.0085 0.2573 70 0.9611 3368304
0.8056 0.2757 75 0.9606 3603000
0.9106 0.2941 80 0.9576 3850976
0.9384 0.3124 85 0.9544 4094752
0.8953 0.3308 90 0.9521 4345860
0.8928 0.3492 95 0.9511 4588756
0.7887 0.3676 100 0.9490 4837704
0.9092 0.3859 105 0.9497 5078112
0.7458 0.4043 110 0.9471 5318968
0.762 0.4227 115 0.9463 5556324
0.8916 0.4411 120 0.9436 5803288
0.791 0.4595 125 0.9442 6042868
0.9366 0.4778 130 0.9417 6282932
0.8494 0.4962 135 0.9418 6522180
1.0078 0.5146 140 0.9399 6773624
0.9159 0.5330 145 0.9380 7011976
1.0115 0.5513 150 0.9390 7257008
0.84 0.5697 155 0.9380 7501580
0.8987 0.5881 160 0.9393 7742124
0.9589 0.6065 165 0.9370 7981768
0.8201 0.6249 170 0.9371 8222304
0.7601 0.6432 175 0.9348 8469856
0.7465 0.6616 180 0.9378 8710912
0.8689 0.6800 185 0.9381 8949132
0.6945 0.6984 190 0.9343 9196744
0.7289 0.7167 195 0.9358 9434412
0.583 0.7351 200 0.9336 9677156
0.6272 0.7535 205 0.9356 9916792
0.7919 0.7719 210 0.9353 10162084
0.9377 0.7903 215 0.9334 10403240
0.7397 0.8086 220 0.9330 10650280
0.6871 0.8270 225 0.9342 10885396
0.9175 0.8454 230 0.9339 11138056
0.621 0.8638 235 0.9336 11382612
0.8007 0.8822 240 0.9324 11620516
0.691 0.9005 245 0.9353 11865444
0.7516 0.9189 250 0.9329 12109276
0.9474 0.9373 255 0.9326 12346224
0.7389 0.9557 260 0.9335 12594020
0.7986 0.9740 265 0.9310 12844164
0.9011 0.9924 270 0.9335 13090264

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
18
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2

Base model

google/gemma-2-27b
Finetuned
(33)
this model