|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- unsloth |
|
- transformers |
|
- tinyllama |
|
|
|
--- |
|
|
|
# Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth! |
|
|
|
A reupload from https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
|
|
We have a Google Colab Tesla T4 notebook for TinyLlama with 4096 max sequence length RoPE Scaling here: https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/u54VK8m8tk) |
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/buy%20me%20a%20coffee%20button.png" width="200"/>](https://ko-fi.com/unsloth) |
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="400"/>](https://github.com/unslothai/unsloth) |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
import torch |
|
from trl import SFTTrainer |
|
from transformers import TrainingArguments |
|
from datasets import load_dataset |
|
max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any! |
|
# Get LAION dataset |
|
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl" |
|
dataset = load_dataset("json", data_files = {"train" : url}, split = "train") |
|
|
|
# 4bit pre quantized models we support - 4x faster downloading! |
|
fourbit_models = [ |
|
"unsloth/mistral-7b-bnb-4bit", |
|
"unsloth/llama-2-7b-bnb-4bit", |
|
"unsloth/llama-2-13b-bnb-4bit", |
|
"unsloth/codellama-34b-bnb-4bit", |
|
"unsloth/tinyllama-bnb-4bit", |
|
] # Go to https://huggingface.co/unsloth for more 4-bit models! |
|
|
|
# Load Llama model |
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name = "unsloth/mistral-7b-bnb-4bit", # Supports Llama, Mistral - replace this! |
|
max_seq_length = max_seq_length, |
|
dtype = None, |
|
load_in_4bit = True, |
|
) |
|
|
|
# Do model patching and add fast LoRA weights |
|
model = FastLanguageModel.get_peft_model( |
|
model, |
|
r = 16, |
|
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", |
|
"gate_proj", "up_proj", "down_proj",], |
|
lora_alpha = 16, |
|
lora_dropout = 0, # Supports any, but = 0 is optimized |
|
bias = "none", # Supports any, but = "none" is optimized |
|
use_gradient_checkpointing = True, |
|
random_state = 3407, |
|
max_seq_length = max_seq_length, |
|
use_rslora = False, # We support rank stabilized LoRA |
|
loftq_config = None, # And LoftQ |
|
) |
|
|
|
trainer = SFTTrainer( |
|
model = model, |
|
train_dataset = dataset, |
|
dataset_text_field = "text", |
|
max_seq_length = max_seq_length, |
|
tokenizer = tokenizer, |
|
args = TrainingArguments( |
|
per_device_train_batch_size = 2, |
|
gradient_accumulation_steps = 4, |
|
warmup_steps = 10, |
|
max_steps = 60, |
|
fp16 = not torch.cuda.is_bf16_supported(), |
|
bf16 = torch.cuda.is_bf16_supported(), |
|
logging_steps = 1, |
|
output_dir = "outputs", |
|
optim = "adamw_8bit", |
|
seed = 3407, |
|
), |
|
) |
|
trainer.train() |
|
|
|
# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like |
|
# (1) Saving to GGUF / merging to 16bit for vLLM |
|
# (2) Continued training from a saved LoRA adapter |
|
# (3) Adding an evaluation loop / OOMs |
|
# (4) Cutomized chat templates |
|
``` |
|
|