Model Card for Model ID

The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries.

Model Details

Model Description

The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries. The training process involved:

  1. Loading the pre-trained FLAN-T5 model
  2. Preprocessing the CNN/DailyMail dataset
  3. Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's Transformers library
  4. Training parameters:
    • Learning rate: 5e-5
    • Batch size: 12
    • Number of epochs: 4
    • FP16 mixed precision
  • Developed by: Preksha Joon
  • Model type: [More Information Needed]
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model [optional]: FLAN-T5

Model Sources [optional]

Uses

Here's an example of how to use the model for inference:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("PreshaJoon/flan-t5-finetuned-summarization")
tokenizer = AutoTokenizer.from_pretrained("PrekshaJoon/flan-t5-finetuned-summarization")

def generate_summary(article):
    inputs = tokenizer("summarize: " + article, return_tensors="pt", max_length=512, truncation=True)
    summary_ids = model.generate(inputs["input_ids"], max_length=128, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids, skip_special_tokens=True)
    return summary

## Deploy and use the model

from transformers import pipeline

summarizer = pipeline("summarization", model="PrekshaJoon/flan-t5-finetuned-summarization")

article = "Write your article here..."
summary = summarizer(article, max_length=128, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)

print(summary[0]['summary_text'])

### Direct Use

article = "Your long article text here..."
summary = generate_summary(article)
print(summary)



## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

The training process involved:

1. Loading the pre-trained FLAN-T5 model
2. Preprocessing the CNN/DailyMail dataset
3. Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's 

#### Preprocessing 
Preprocess the dataset by tokenizing it and preparing it for the FLAN-T5 model.


#### Training Hyperparameters

- **Training regime:**  fp16 mixed precision

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]


#### Metrics

Use of rogue-score matric for evaluation

### EvaluationResults

The model was evaluated using ROUGE scores. Here are the results on the validation set:

rouge1: 0.3913
rouge2: 0.2889
rougeL: 0.3696
rougeLsum: 0.3696

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** A100 GPU
- **Hours used:** 7
- **Cloud Provider:** Google
- **Compute Region:** [More Information Needed]


## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]


## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]


## Model Card Contact
[email protected]
Downloads last month
16
Safetensors
Model size
77M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for PrekshaJoon/flan-t5-finetuned-summarization

Finetuned
(321)
this model

Dataset used to train PrekshaJoon/flan-t5-finetuned-summarization