collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9331
Num Input Tokens Seen: 13190464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
2.3244	0.0184	5	1.0518	240912
2.2442	0.0368	10	0.9933	480908
2.1347	0.0551	15	0.9797	713948
2.0779	0.0735	20	0.9788	953808
1.6988	0.0919	25	0.9776	1202776
1.6197	0.1103	30	0.9794	1447736
1.5939	0.1286	35	0.9787	1694460
1.391	0.1470	40	0.9787	1934204
1.1954	0.1654	45	0.9771	2171112
1.1232	0.1838	50	0.9747	2409548
1.1961	0.2022	55	0.9722	2648484
0.9664	0.2205	60	0.9710	2887652
1.1064	0.2389	65	0.9667	3127516
1.0085	0.2573	70	0.9611	3368304
0.8056	0.2757	75	0.9606	3603000
0.9106	0.2941	80	0.9576	3850976
0.9384	0.3124	85	0.9544	4094752
0.8953	0.3308	90	0.9521	4345860
0.8928	0.3492	95	0.9511	4588756
0.7887	0.3676	100	0.9490	4837704
0.9092	0.3859	105	0.9497	5078112
0.7458	0.4043	110	0.9471	5318968
0.762	0.4227	115	0.9463	5556324
0.8916	0.4411	120	0.9436	5803288
0.791	0.4595	125	0.9442	6042868
0.9366	0.4778	130	0.9417	6282932
0.8494	0.4962	135	0.9418	6522180
1.0078	0.5146	140	0.9399	6773624
0.9159	0.5330	145	0.9380	7011976
1.0115	0.5513	150	0.9390	7257008
0.84	0.5697	155	0.9380	7501580
0.8987	0.5881	160	0.9393	7742124
0.9589	0.6065	165	0.9370	7981768
0.8201	0.6249	170	0.9371	8222304
0.7601	0.6432	175	0.9348	8469856
0.7465	0.6616	180	0.9378	8710912
0.8689	0.6800	185	0.9381	8949132
0.6945	0.6984	190	0.9343	9196744
0.7289	0.7167	195	0.9358	9434412
0.583	0.7351	200	0.9336	9677156
0.6272	0.7535	205	0.9356	9916792
0.7919	0.7719	210	0.9353	10162084
0.9377	0.7903	215	0.9334	10403240
0.7397	0.8086	220	0.9330	10650280
0.6871	0.8270	225	0.9342	10885396
0.9175	0.8454	230	0.9339	11138056
0.621	0.8638	235	0.9336	11382612
0.8007	0.8822	240	0.9324	11620516
0.691	0.9005	245	0.9353	11865444
0.7516	0.9189	250	0.9329	12109276
0.9474	0.9373	255	0.9326	12346224
0.7389	0.9557	260	0.9335	12594020
0.7986	0.9740	265	0.9310	12844164
0.9011	0.9924	270	0.9335	13090264

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2

collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd2

Evaluation results