Model Card for Model ID
llama3-8B supervised finetuning with llama-adapter 4bit quantization
Model Details
adapter_layers:30 adapter_len:10 gamma:0.85 batch_size_training:4 gradient_accumulation_steps:4 lr:0.0001 num_epochs:3 num_freeze_layers:1 optimizer:"AdamW" peft_method:"llama_adapter" trainable params: 1,228,830 || all params: 8,031,490,078 || trainable%: 0.0153
Model Description
Average epoch time: 566s Train loss: 0.41620415449142456 Eval loss: 1.57061767578125
Max CUDA memory allocated was 14 GB Max CUDA memory reserved was 16 GB Peak active CUDA memory was 14 GB CPU Total Peak Memory consumed during the train (max): 4 GB