--- license: apache-2.0 datasets: - samsum language: - en library_name: transformers tags: - peft - lora - t5 - flan metrics: - rouge model-index: - name: flan-t5-xxl-samsum-peft results: - task: name: Sequence-to-sequence Language Modeling type: text2text-generation dataset: name: samsum type: samsum config: samsum split: train args: samsum metrics: - name: Rouge1 type: rouge value: 50.386161 --- # FLAN-T5-XXL LoRA fine-tuned on `samsum` PEFT tuned FLAN-T5 XXL model. # flan-t5-base-samsum This model is a fine-tuned version of [philschmid/flan-t5-xxl-sharded-fp16](https://huggingface.co/philschmid/flan-t5-xxl-sharded-fp16) on the samsum dataset. It achieves the following results on the evaluation set: - rogue1: 50.386161% - rouge2: 24.842412% - rougeL: 41.370130% - rougeLsum: 41.394230% - ## How to use the model The model was trained using 🤗 [PEFT](https://github.com/huggingface/peft). This repository only contains the fine-tuned adapter weights for LoRA and the configuration to load the model. Below you can find a snippet on how to run inference using the model. This will load the FLAN-T5-XXL from hugging face if not existing locally. 1. load the model ```python import torch from peft import PeftModel, PeftConfig from transformers import AutoModelForSeq2SeqLM, AutoTokenizer # Load peft config for pre-trained checkpoint etc. peft_model_id = "philschmid/flan-t5-xxl-samsum-peft" config = PeftConfig.from_pretrained(peft_model_id) # load base LLM model and tokenizer model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map={"":0}) tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) # Load the Lora model model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0}) model.eval() ``` 2. generate ```python text = "test" input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.cuda() outputs = model.generate(input_ids=input_ids, max_new_tokens=10, do_sample=True, top_p=0.9) print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]) ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-3 - train_batch_size: auto - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5 ### Framework versions - Transformers 4.27.1 - Pytorch 1.13.1+cu117 - Datasets 2.9.1 - PEFT@main