|
--- |
|
tags: |
|
- GGUF |
|
- iMat |
|
- Llama3 |
|
- conversational |
|
--- |
|
|
|
``` |
|
e88 88e d8 |
|
d888 888b 8888 8888 ,"Y88b 888 8e d88 |
|
C8888 8888D 8888 8888 "8" 888 888 88b d88888 |
|
Y888 888P Y888 888P ,ee 888 888 888 888 |
|
"88 88" "88 88" "88 888 888 888 888 |
|
b |
|
8b, |
|
|
|
e88'Y88 d8 888 |
|
d888 'Y ,"Y88b 888,8, d88 ,e e, 888 |
|
C8888 "8" 888 888 " d88888 d88 88b 888 |
|
Y888 ,d ,ee 888 888 888 888 , 888 |
|
"88,d88 "88 888 888 888 "YeeP" 888 |
|
|
|
PROUDLY PRESENTS |
|
``` |
|
|
|
## experiment_1_8b-iMat-GGUF |
|
|
|
<b>Quantization Note: Use repetition penalty (--repeat-penalty on llama.cpp) of ~1.15 for best results </b> |
|
|
|
Quantized from fp16 with love. |
|
* Weighted quantizations were created using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 189 chunks and n_ctx=512 |
|
* This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details |
|
* The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file |
|
|
|
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747) |
|
|
|
<b>All quants are verified working prior to uploading to repo for your safety and convenience. </b> |
|
|
|
Original model card [here](https://huggingface.co/jukofyork/Dusk-Miqu-70B/) and below |
|
|
|
--- |
|
|
|
# **UNTESTED, probably unfit for human consumption** |
|
|
|
1 epoch of grimulkan/LimaRP-augmented on LLaMA3-8b via unsloth on colab, using the llama-chat template. 16k context, probably. |
|
``` |
|
model = FastLanguageModel.get_peft_model( |
|
model, |
|
r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 |
|
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", |
|
"gate_proj", "up_proj", "down_proj",], |
|
lora_alpha = 16, |
|
lora_dropout = 0, # Supports any, but = 0 is optimized |
|
bias = "none", # Supports any, but = "none" is optimized |
|
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes! |
|
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context |
|
random_state = 3407, |
|
use_rslora = False, # We support rank stabilized LoRA |
|
loftq_config = None, # And LoftQ |
|
) |
|
|
|
trainer = SFTTrainer( |
|
model = model, |
|
tokenizer = tokenizer, |
|
train_dataset = dataset, |
|
dataset_text_field = "text", |
|
max_seq_length = max_seq_length, |
|
dataset_num_proc = 2, |
|
packing = False, # Can make training 5x faster for short sequences. |
|
args = TrainingArguments( |
|
per_device_train_batch_size = 1, |
|
gradient_accumulation_steps = 8, |
|
warmup_steps = 5, |
|
num_train_epochs=1, |
|
learning_rate = 2e-4, |
|
fp16 = not torch.cuda.is_bf16_supported(), |
|
bf16 = torch.cuda.is_bf16_supported(), |
|
logging_steps = 1, |
|
optim = "adamw_8bit", |
|
weight_decay = 0.01, |
|
lr_scheduler_type = "linear", |
|
seed = 3407, |
|
output_dir = "outputs", |
|
), |
|
) |
|
``` |
|
|
|
|