llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the brasingh_publicis_f5f dataset. It achieves the following results on the evaluation set:

Loss: 0.0205

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.1038	0.0606	5	0.0979
0.0759	0.1212	10	0.0765
0.069	0.1818	15	0.0683
0.0729	0.2424	20	0.0620
0.0545	0.3030	25	0.0571
0.0589	0.3636	30	0.0528
0.0461	0.4242	35	0.0501
0.0522	0.4848	40	0.0493
0.052	0.5455	45	0.0483
0.0459	0.6061	50	0.0458
0.0363	0.6667	55	0.0434
0.0553	0.7273	60	0.0418
0.0444	0.7879	65	0.0403
0.0469	0.8485	70	0.0397
0.0417	0.9091	75	0.0386
0.0388	0.9697	80	0.0372
0.0309	1.0303	85	0.0358
0.0487	1.0909	90	0.0354
0.0348	1.1515	95	0.0340
0.0308	1.2121	100	0.0334
0.0318	1.2727	105	0.0330
0.028	1.3333	110	0.0322
0.0311	1.3939	115	0.0321
0.0382	1.4545	120	0.0315
0.0316	1.5152	125	0.0304
0.0278	1.5758	130	0.0299
0.0285	1.6364	135	0.0292
0.0257	1.6970	140	0.0285
0.0244	1.7576	145	0.0281
0.0256	1.8182	150	0.0278
0.0338	1.8788	155	0.0270
0.0309	1.9394	160	0.0262
0.0378	2.0	165	0.0261
0.0275	2.0606	170	0.0263
0.0225	2.1212	175	0.0259
0.0232	2.1818	180	0.0256
0.0193	2.2424	185	0.0255
0.0251	2.3030	190	0.0253
0.0228	2.3636	195	0.0249
0.0195	2.4242	200	0.0249
0.0219	2.4848	205	0.0241
0.0184	2.5455	210	0.0238
0.0199	2.6061	215	0.0236
0.023	2.6667	220	0.0232
0.0227	2.7273	225	0.0234
0.0206	2.7879	230	0.0230
0.0217	2.8485	235	0.0225
0.0186	2.9091	240	0.0224
0.0201	2.9697	245	0.0220
0.0147	3.0303	250	0.0220
0.0142	3.0909	255	0.0226
0.0149	3.1515	260	0.0218
0.0151	3.2121	265	0.0215
0.0174	3.2727	270	0.0217
0.0172	3.3333	275	0.0213
0.017	3.3939	280	0.0211
0.0223	3.4545	285	0.0212
0.0144	3.5152	290	0.0211
0.0125	3.5758	295	0.0208
0.0163	3.6364	300	0.0207
0.015	3.6970	305	0.0207
0.0154	3.7576	310	0.0206
0.0186	3.8182	315	0.0203
0.0135	3.8788	320	0.0202
0.0159	3.9394	325	0.0201
0.0211	4.0	330	0.0200
0.0134	4.0606	335	0.0202
0.0113	4.1212	340	0.0206
0.0117	4.1818	345	0.0208
0.0108	4.2424	350	0.0209
0.012	4.3030	355	0.0207
0.0111	4.3636	360	0.0206
0.0118	4.4242	365	0.0205
0.0099	4.4848	370	0.0206
0.0118	4.5455	375	0.0206
0.0119	4.6061	380	0.0206
0.0114	4.6667	385	0.0206
0.0109	4.7273	390	0.0206
0.0124	4.7879	395	0.0205
0.0111	4.8485	400	0.0206
0.012	4.9091	405	0.0206
0.0104	4.9697	410	0.0205

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

brasingh_publicis_f5f

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/brasingh_publicis_f5f

Evaluation results