9eb7c03b-5f54-4c10-89f5-7d18d5b90118

This model is a fine-tuned version of HuggingFaceH4/tiny-random-LlamaForCausalLM on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000208
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0004	1	10.3759
10.3612	0.0182	50	10.3642
10.3606	0.0365	100	10.3606
10.3554	0.0547	150	10.3574
10.3516	0.0730	200	10.3555
10.3511	0.0912	250	10.3545
10.3519	0.1095	300	10.3539
10.3531	0.1277	350	10.3534
10.3496	0.1460	400	10.3532
10.3515	0.1642	450	10.3531
10.3497	0.1824	500	10.3529