--- library_name: transformers base_model: mrcuddle/Dark-Hermes3-Llama3.2-3B datasets: - llamafactory/alpaca_en model-index: - name: content/outputs results: [] language: - en tags: - Llama-3 - instruct - finetune - chatml - gpt4 - synthetic data - distillation - function calling - json mode - axolotl - roleplaying - chat - generated_from_trainer widget: - example_title: Hermes 3 messages: - role: system content: >- You are a sentient, superintelligent artificial general intelligence, here to teach and assist me. - role: user content: >- Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world. --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.6.0` ```yaml base_model: mrcuddle/Dark-Hermes3-Llama3.2-3B dataloader_num_workers: 4 datasets: - dataset_prepared_path: last_run_prepared path: llamafactory/alpaca_en type: alpaca eval_steps: 500 evaluation_strategy: steps fp16: true gradient_accumulation_steps: 8 gradient_checkpointing: false learning_rate: 2e-5 load_in_4bit: false logging_dir: /content/outputs/logs logging_steps: 10 lr_scheduler: cosine lr_scheduler_type: cosine micro_batch_size: 1 num_train_epochs: 3 optimizer: paged_adamw_8bit output_dir: /content/outputs overwrite_output_dir: true per_device_train_batch_size: 4 save_steps: 500 save_total_limit: 2 use_peft: false val_set_size: 0.05 warmup_steps: 100 ```

# content/outputs This model is a fine-tuned version of [mrcuddle/Dark-Hermes3-Llama3.2-3B](https://huggingface.co/mrcuddle/Dark-Hermes3-Llama3.2-3B) on the llamafactory/alpaca_en dataset. It achieves the following results on the evaluation set: - Loss: 1.1205 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | No log | 0.0002 | 1 | 2.4030 | | 1.2572 | 0.0814 | 500 | 1.1935 | | 1.3061 | 0.1629 | 1000 | 1.1865 | | 1.2733 | 0.2443 | 1500 | 1.1864 | | 1.265 | 0.3258 | 2000 | 1.1753 | | 1.2436 | 0.4072 | 2500 | 1.1542 | | 1.2935 | 0.4887 | 3000 | 1.1448 | | 1.2595 | 0.5701 | 3500 | 1.1348 | | 1.2896 | 0.6515 | 4000 | 1.1295 | | 1.2081 | 0.7330 | 4500 | 1.1236 | | 1.2451 | 0.8144 | 5000 | 1.1212 | | 1.2134 | 0.8959 | 5500 | 1.1205 | | 1.2437 | 0.9773 | 6000 | 1.1205 | ### Framework versions - Transformers 4.47.1 - Pytorch 2.5.1+cu121 - Datasets 3.2.0 - Tokenizers 0.21.0