DeepSeek-R1-Distill-Qwen-1.5B-sft / train_results.json
bluryar's picture
Upload folder using huggingface_hub
426aeec verified
raw
history blame contribute delete
207 Bytes
{
"epoch": 2.0,
"total_flos": 1.317227096577147e+18,
"train_loss": 2.02712393619128,
"train_runtime": 22983.8323,
"train_samples_per_second": 15.776,
"train_steps_per_second": 0.247
}