llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the spinny dataset. It achieves the following results on the evaluation set:

Loss: 0.0079

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0598	0.0578	5	0.0557
0.0388	0.1156	10	0.0346
0.0275	0.1734	15	0.0276
0.0218	0.2312	20	0.0228
0.0236	0.2890	25	0.0203
0.0182	0.3468	30	0.0179
0.019	0.4046	35	0.0162
0.017	0.4624	40	0.0147
0.0147	0.5202	45	0.0137
0.0118	0.5780	50	0.0132
0.0107	0.6358	55	0.0127
0.016	0.6936	60	0.0123
0.0144	0.7514	65	0.0116
0.0119	0.8092	70	0.0113
0.0111	0.8671	75	0.0109
0.012	0.9249	80	0.0107
0.0139	0.9827	85	0.0102
0.0085	1.0405	90	0.0104
0.01	1.0983	95	0.0102
0.009	1.1561	100	0.0099
0.0094	1.2139	105	0.0098
0.0069	1.2717	110	0.0099
0.0108	1.3295	115	0.0096
0.0066	1.3873	120	0.0095
0.0089	1.4451	125	0.0094
0.0084	1.5029	130	0.0093
0.0102	1.5607	135	0.0093
0.01	1.6185	140	0.0091
0.0098	1.6763	145	0.0088
0.0071	1.7341	150	0.0087
0.0094	1.7919	155	0.0086
0.008	1.8497	160	0.0086
0.01	1.9075	165	0.0085
0.0084	1.9653	170	0.0086
0.0058	2.0231	175	0.0087
0.0056	2.0809	180	0.0090
0.0077	2.1387	185	0.0086
0.0061	2.1965	190	0.0086
0.008	2.2543	195	0.0083
0.0058	2.3121	200	0.0083
0.0047	2.3699	205	0.0084
0.0066	2.4277	210	0.0084
0.0055	2.4855	215	0.0082
0.0056	2.5434	220	0.0083
0.005	2.6012	225	0.0082
0.0065	2.6590	230	0.0082
0.0061	2.7168	235	0.0081
0.0052	2.7746	240	0.0082
0.0053	2.8324	245	0.0081
0.0058	2.8902	250	0.0079
0.0052	2.9480	255	0.0078
0.0071	3.0058	260	0.0080
0.0051	3.0636	265	0.0082
0.0033	3.1214	270	0.0086
0.004	3.1792	275	0.0084
0.0032	3.2370	280	0.0082
0.0042	3.2948	285	0.0082
0.0035	3.3526	290	0.0082
0.0041	3.4104	295	0.0081
0.0048	3.4682	300	0.0080
0.0046	3.5260	305	0.0080
0.004	3.5838	310	0.0080
0.0032	3.6416	315	0.0081
0.0039	3.6994	320	0.0084
0.0042	3.7572	325	0.0083
0.0046	3.8150	330	0.0080
0.0035	3.8728	335	0.0081
0.0048	3.9306	340	0.0081
0.0056	3.9884	345	0.0080
0.0025	4.0462	350	0.0080
0.0035	4.1040	355	0.0082
0.0028	4.1618	360	0.0083
0.0028	4.2197	365	0.0084
0.003	4.2775	370	0.0085
0.0033	4.3353	375	0.0085
0.003	4.3931	380	0.0086
0.0022	4.4509	385	0.0086
0.0028	4.5087	390	0.0086
0.0028	4.5665	395	0.0085
0.0031	4.6243	400	0.0085
0.0038	4.6821	405	0.0084
0.0024	4.7399	410	0.0084
0.0024	4.7977	415	0.0084
0.0024	4.8555	420	0.0084
0.0026	4.9133	425	0.0084
0.0029	4.9711	430	0.0084

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

spinny

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/spinny

Evaluation results