Model Details

Model Description

Trismegistus for Llama 3.2 1b. Credits to teknium for dataset and original model.

Model Sources [optional]

Llama 3.2 1b

Uses

  • Use for esoteric joy.

Bias, Risks, and Limitations

  • May be biased as hell.

  • Recommendation:

    • Don't take it personally.

How to Get Started with the Model

  • Run it.

Training Data

Training Hyperparameters

  • lora 4bit peft

Speeds, Sizes, Times [optional]

  • global_step=16905
  • training_loss=1.169401215731269
  • train_runtime: 21882.4747
  • train_samples_per_second: 3.09
  • train_steps_per_second: 0.773
  • total_flos: 4.437195883099177e+17
  • train_loss': 1.169401215731269
  • epoch: 5.0

Evaluation and Metrics

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc ↑ 0.3345 ± 0.0138
none 0 acc_norm ↑ 0.3695 ± 0.0141
arc_easy 1 none 0 acc ↑ 0.6044 ± 0.0100
none 0 acc_norm ↑ 0.5694 ± 0.0102
boolq 2 none 0 acc ↑ 0.6410 ± 0.0084
hellaswag 1 none 0 acc ↑ 0.4400 ± 0.0050
none 0 acc_norm ↑ 0.5728 ± 0.0049
openbookqa 1 none 0 acc ↑ 0.2260 ± 0.0187
none 0 acc_norm ↑ 0.3540 ± 0.0214
piqa 1 none 0 acc ↑ 0.7002 ± 0.0107
none 0 acc_norm ↑ 0.7024 ± 0.0107
winogrande 1 none 0 acc ↑ 0.5785 ± 0.0139

Environmental Impact

Will steal your horse and kill your cat.

Downloads last month
131
Safetensors
Model size
1.24B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for jtatman/llama-3.2-1b-trismegistus

Finetuned
(238)
this model
Quantizations
1 model

Dataset used to train jtatman/llama-3.2-1b-trismegistus