llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the goavanto2-oneshot-train dataset. It achieves the following results on the evaluation set:

Loss: 0.0032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0512	0.0741	5	0.0457
0.034	0.1481	10	0.0291
0.0258	0.2222	15	0.0234
0.0185	0.2963	20	0.0186
0.014	0.3704	25	0.0155
0.0178	0.4444	30	0.0133
0.0157	0.5185	35	0.0116
0.014	0.5926	40	0.0102
0.0098	0.6667	45	0.0091
0.0074	0.7407	50	0.0082
0.007	0.8148	55	0.0076
0.0078	0.8889	60	0.0073
0.0095	0.9630	65	0.0070
0.0064	1.0370	70	0.0067
0.0114	1.1111	75	0.0064
0.0059	1.1852	80	0.0060
0.0091	1.2593	85	0.0059
0.0051	1.3333	90	0.0055
0.0093	1.4074	95	0.0054
0.0048	1.4815	100	0.0051
0.0042	1.5556	105	0.0050
0.0044	1.6296	110	0.0049
0.0047	1.7037	115	0.0048
0.0047	1.7778	120	0.0047
0.0054	1.8519	125	0.0046
0.0042	1.9259	130	0.0043
0.0053	2.0	135	0.0043
0.0023	2.0741	140	0.0043
0.0053	2.1481	145	0.0043
0.0029	2.2222	150	0.0042
0.0036	2.2963	155	0.0041
0.0035	2.3704	160	0.0041
0.0031	2.4444	165	0.0041
0.003	2.5185	170	0.0040
0.0039	2.5926	175	0.0040
0.0036	2.6667	180	0.0038
0.0042	2.7407	185	0.0037
0.0032	2.8148	190	0.0036
0.0041	2.8889	195	0.0036
0.0053	2.9630	200	0.0035
0.0036	3.0370	205	0.0034
0.0054	3.1111	210	0.0035
0.0047	3.1852	215	0.0036
0.0022	3.2593	220	0.0034
0.003	3.3333	225	0.0034
0.0019	3.4074	230	0.0033
0.0034	3.4815	235	0.0034
0.0025	3.5556	240	0.0033
0.002	3.6296	245	0.0033
0.0015	3.7037	250	0.0033
0.0027	3.7778	255	0.0033
0.0015	3.8519	260	0.0032
0.0017	3.9259	265	0.0032
0.0027	4.0	270	0.0031
0.0014	4.0741	275	0.0031
0.0015	4.1481	280	0.0032
0.0014	4.2222	285	0.0032
0.002	4.2963	290	0.0033
0.0021	4.3704	295	0.0033
0.0035	4.4444	300	0.0032
0.0014	4.5185	305	0.0032
0.0023	4.5926	310	0.0032
0.0016	4.6667	315	0.0032
0.0016	4.7407	320	0.0032
0.0015	4.8148	325	0.0032
0.0014	4.8889	330	0.0032
0.0017	4.9630	335	0.0032

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

neel-nanonets
/

goavanto_2

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for neel-nanonets/goavanto_2

Evaluation results