fc40899b-5ef2-4690-b3e4-4bd9b362eda1

This model is a fine-tuned version of HuggingFaceH4/tiny-random-LlamaForCausalLM on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000207
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0002	1	10.3744
10.3428	0.0082	50	10.3580
10.3326	0.0164	100	10.3471
10.3325	0.0245	150	10.3437
10.3307	0.0327	200	10.3412
10.3322	0.0409	250	10.3399
10.3319	0.0491	300	10.3393
10.3282	0.0572	350	10.3387
10.3256	0.0654	400	10.3384
10.3212	0.0736	450	10.3384
10.3224	0.0818	500	10.3384