yeah my mental when things do not go well

70B-L3.3-mhnnn-x1

I quite liked it, after messing around. Same data composition as Freya, applied differently.

Has occasional brainfarts which are fixed with a regen, the price for more creative outputs.

Recommended Model Settings | Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.

Prompt Format: Llama-3-Instruct
Temperature: 1.1
min_p: 0.05

Types of Data included within Sets

Completion - Novels / eBooks
Text Adventure - Include details like 'Text Adventure Narrator' in the System Prompt, give it a one-shot example and it'll fly.
Amoral Assistant - Include the terms 'Amoral', 'Neutral' along with the regular assistant prompt for better results.
Instruct / Assistant - The usual assistant tasks.
Roleplay - As per Usual, Regular Sets 

Training time in total was ~14 Hours on a 8xH100 Node, shout out to SCDF for not sponsoring this run. My funds are dry doing random things.

https://sao10k.carrd.co/ for contact.


Built with Axolotl

See axolotl config

axolotl version: 0.6.0

adapter: lora # 16-bit
lora_r: 64
lora_alpha: 64
lora_dropout: 0.2
peft_use_rslora: true
lora_target_linear: true
  
# Data
dataset_prepared_path: dataset_run_freya
datasets:
# S1 - Writing / Completion
  - path: datasets/eBooks-cleaned-75K
    type: completion
  - path: datasets/novels-clean-dedupe-10K
    type: completion
# S2 - Instruct
  - path: datasets/10k-amoral-full-fixed-sys.json
    type: chat_template
    chat_template: llama3
    roles_to_train: ["gpt"]
    field_messages: conversations
    message_field_role: from
    message_field_content: value
    train_on_eos: turn
  - path: datasets/44k-hespera-smartshuffle.json
    type: chat_template
    chat_template: llama3
    roles_to_train: ["gpt"]
    field_messages: conversations
    message_field_role: from
    message_field_content: value
    train_on_eos: turn
  - path: datasets/5k_rpg_adventure_instruct-sys.json
    type: chat_template
    chat_template: llama3
    roles_to_train: ["gpt"]
    field_messages: conversations
    message_field_role: from
    message_field_content: value
    train_on_eos: turn
shuffle_merged_datasets: true
warmup_ratio: 0.1

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

# Iterations
num_epochs: 1

# Sampling
sample_packing: true
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false

# Batching
gradient_accumulation_steps: 4
micro_batch_size: 2
gradient_checkpointing: unsloth

# Evaluation
val_set_size: 0.025
evals_per_epoch: 5
eval_table_size:
eval_max_new_tokens: 256
eval_sample_packing: false
eval_batch_size: 1

# Optimizer
optimizer: paged_ademamix_8bit
lr_scheduler: cosine
learning_rate: 0.00000242
weight_decay: 0.2
max_grad_norm: 10.0

# Garbage Collection
gc_steps: 10

# Misc
deepspeed: ./deepspeed_configs/zero3_bf16.json

Downloads last month
90
Safetensors
Model size
70.6B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Sao10K/70B-L3.3-mhnnn-x1

Finetuned
(99)
this model
Quantizations
3 models