LoRA methods

A popular way to efficiently train large models is to insert (typically in the attention blocks) smaller trainable matrices that are a low-rank decomposition of the delta weight matrix to be learnt during finetuning. The pretrained model’s original weight matrix is frozen and only the smaller matrices are updated during training. This reduces the number of trainable parameters, reducing memory usage and training time which can be very expensive for large models.

There are several different ways to express the weight matrix as a low-rank decomposition, but Low-Rank Adaptation (LoRA) is the most common method. The PEFT library supports several other LoRA variants, such as Low-Rank Hadamard Product (LoHa), Low-Rank Kronecker Product (LoKr), and Adaptive Low-Rank Adaptation (AdaLoRA). You can learn more about how these methods work conceptually in the Adapters guide. If you’re interested in applying these methods to other tasks and use cases like semantic segmentation, token classification, take a look at our notebook collection!

Additionally, PEFT supports the X-LoRA Mixture of LoRA Experts method.

This guide will show you how to quickly train an image classification model - with a low-rank decomposition method - to identify the class of food shown in an image.

Some familiarity with the general process of training an image classification model would be really helpful and allow you to focus on the low-rank decomposition methods. If you’re new, we recommend taking a look at the Image classification guide first from the Transformers documentation. When you’re ready, come back and see how easy it is to drop PEFT in to your training!

Before you begin, make sure you have all the necessary libraries installed.

pip install -q peft transformers datasets

Dataset

In this guide, you’ll use the Food-101 dataset which contains images of 101 food classes (take a look at the dataset viewer to get a better idea of what the dataset looks like).

Load the dataset with the load_dataset function.

from datasets import load_dataset

ds = load_dataset("food101")

Each food class is labeled with an integer, so to make it easier to understand what these integers represent, you’ll create a label2id and id2label dictionary to map the integer to its class label.

labels = ds["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = i
    id2label[i] = label

id2label[2]
"baklava"

Load an image processor to properly resize and normalize the pixel values of the training and evaluation images.

from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")

You can also use the image processor to prepare some transformation functions for data augmentation and pixel scaling.

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
    [
        RandomResizedCrop(image_processor.size["height"]),
        RandomHorizontalFlip(),
        ToTensor(),
        normalize,
    ]
)

val_transforms = Compose(
    [
        Resize(image_processor.size["height"]),
        CenterCrop(image_processor.size["height"]),
        ToTensor(),
        normalize,
    ]
)

def preprocess_train(example_batch):
    example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

def preprocess_val(example_batch):
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

Define the training and validation datasets, and use the set_transform function to apply the transformations on-the-fly.

train_ds = ds["train"]
val_ds = ds["validation"]

train_ds.set_transform(preprocess_train)
val_ds.set_transform(preprocess_val)

Finally, you’ll need a data collator to create a batch of training and evaluation data and convert the labels to torch.tensor objects.

import torch

def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

Model

Now let’s load a pretrained model to use as the base model. This guide uses the google/vit-base-patch16-224-in21k model, but you can use any image classification model you want. Pass the label2id and id2label dictionaries to the model so it knows how to map the integer labels to their class labels, and you can optionally pass the ignore_mismatched_sizes=True parameter if you’re finetuning a checkpoint that has already been finetuned.

from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

model = AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224-in21k",
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)

PEFT configuration and model

Every PEFT method requires a configuration that holds all the parameters specifying how the PEFT method should be applied. Once the configuration is setup, pass it to the get_peft_model() function along with the base model to create a trainable PeftModel.

Call the print_trainable_parameters() method to compare the number of parameters of PeftModel versus the number of parameters in the base model!

LoRA

LoHa

LoKr

AdaLoRA

Training

For training, let’s use the Trainer class from Transformers. The Trainer contains a PyTorch training loop, and when you’re ready, call train to start training. To customize the training run, configure the training hyperparameters in the TrainingArguments class. With LoRA-like methods, you can afford to use a higher batch size and learning rate.

AdaLoRA has an update_and_allocate() method that should be called at each training step to update the parameter budget and mask, otherwise the adaptation step is not performed. This requires writing a custom training loop or subclassing the Trainer to incorporate this method. As an example, take a look at this custom training loop.

from transformers import TrainingArguments, Trainer

account = "stevhliu"
peft_model_id = f"{account}/google/vit-base-patch16-224-in21k-lora"
batch_size = 128

args = TrainingArguments(
    peft_model_id,
    remove_unused_columns=False,
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    num_train_epochs=5,
    logging_steps=10,
    load_best_model_at_end=True,
    label_names=["labels"],
)

Begin training with train.

trainer = Trainer(
    model,
    args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    tokenizer=image_processor,
    data_collator=collate_fn,
)
trainer.train()

Share your model

Once training is complete, you can upload your model to the Hub with the push_to_hub method. You’ll need to login to your Hugging Face account first and enter your token when prompted.

from huggingface_hub import notebook_login

notebook_login()

Call push_to_hub to save your model to your repositoy.

model.push_to_hub(peft_model_id)

Inference

Let’s load the model from the Hub and test it out on a food image.

from peft import PeftConfig, PeftModel
from transformers import AutoImageProcessor
from PIL import Image
import requests

config = PeftConfig.from_pretrained("stevhliu/vit-base-patch16-224-in21k-lora")
model = AutoModelForImageClassification.from_pretrained(
    config.base_model_name_or_path,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)
model = PeftModel.from_pretrained(model, "stevhliu/vit-base-patch16-224-in21k-lora")

url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
image

Convert the image to RGB and return the underlying PyTorch tensors.

encoding = image_processor(image.convert("RGB"), return_tensors="pt")

Now run the model and return the predicted class!

with torch.no_grad():
    outputs = model(**encoding)
    logits = outputs.logits

predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
"Predicted class: beignets"

< > Update on GitHub