QLoRA-Flan-T5-Small

This model is a fine-tuned version of google/flan-t5-small on the cnn_dailymail dataset. It achieves the following on the test set:

  • ROUGE-1: 0.3484265780526604
  • ROUGE-2: 0.14343059577230782
  • ROUGE-l: 0.32809541498574013

Model description

This model was fine-tuned with the purpose of performing the task of abstractive summarization.

Training and evaluation data

Fine-tuned on cnn_dailymail training set Evaluated on cnn_dailymail test set

How to use model

  1. Loading the model
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load peft config for pre-trained checkpoint etc. 
peft_model_id = "emonty777/QLoRA-Flan-T5-Small"

config = PeftConfig.from_pretrained(peft_model_id)

# load base LLM model and tokenizer / runs on CPU
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# load base LLM model and tokenizer for GPU
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path,  load_in_8bit=True,  device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()
  1. Generating summaries
text = "Your text goes here..."

# If you want to use CPU
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids
# If you want to use GPU
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.cuda()
# Adjust max_new_tokens based on size. This is set up for articles of text
outputs = model.generate(input_ids=input_ids, max_new_tokens=120, do_sample=False)

print(f"input sentence: {sample['article']}\n{'---'* 20}")
print(f"summary:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 4

Training results

Evaluated on full CNN Dailymail test set

  • ROUGE-1: 0.3484265780526604
  • ROUGE-2: 0.14343059577230782
  • ROUGE-l: 0.32809541498574013

Framework versions

  • Transformers 4.27.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.9.0
  • Tokenizers 0.13.3
Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train emonty777/QLoRA-Flan-T5-Small