v3c_llama_lora

This model is a fine-tuned version of mtzig/prm800k_llama_debug_full on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 765837
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
No log	0	0	0.6173	0.7487	1.0	0.06	0.1132
0.3808	0.0492	40	0.5695	0.7487	0.8	0.08	0.1455
0.3036	0.0984	80	0.4816	0.7647	0.6364	0.28	0.3889
0.305	0.1476	120	0.4852	0.8021	0.7241	0.42	0.5316
0.256	0.1967	160	0.4328	0.8021	0.7826	0.36	0.4932
0.2062	0.2459	200	0.4699	0.7861	0.75	0.3	0.4286
0.2004	0.2951	240	0.4480	0.7807	0.7143	0.3	0.4225
0.2241	0.3443	280	0.4449	0.7807	0.7143	0.3	0.4225
0.1505	0.3935	320	0.4088	0.8182	0.75	0.48	0.5854
0.1752	0.4427	360	0.4386	0.7861	0.75	0.3	0.4286
0.2382	0.4919	400	0.4186	0.8128	0.7778	0.42	0.5455
0.238	0.5410	440	0.4313	0.7914	0.7391	0.34	0.4658
0.1448	0.5902	480	0.4161	0.8128	0.7778	0.42	0.5455
0.2096	0.6394	520	0.4251	0.7968	0.75	0.36	0.4865
0.204	0.6886	560	0.4413	0.7914	0.7391	0.34	0.4658
0.1545	0.7378	600	0.4312	0.7968	0.75	0.36	0.4865
0.1883	0.7870	640	0.4288	0.8021	0.76	0.38	0.5067
0.2403	0.8362	680	0.4288	0.8021	0.76	0.38	0.5067
0.1937	0.8853	720	0.4245	0.8021	0.76	0.38	0.5067
0.164	0.9345	760	0.4182	0.8075	0.7692	0.4	0.5263
0.2185	0.9837	800	0.4195	0.8128	0.7778	0.42	0.5455