Model Card for Gemma-2-2B-it Telugu News Headline Generator

This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.

Model Details

Model Description

  • Developed by: Google (base model) with Telugu news fine-tuning
  • Model type: Decoder-only transformer language model
  • Language(s): Telugu
  • License: Apache 2.0
  • Finetuned from model: Gemma-2-2B

Model Sources

  • Repository: Hugging Face Hub
  • Base Model: google/gemma-2-2b-it

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation")
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")

text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
headline = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Details

Training Data

  • Telugu news articles and headlines dataset
  • Data cleaned and preprocessed for headline generation task
  • Articles spanning various news categories

Training Procedure

Training Hyperparameters

  • Training regime: FP16 mixed precision
  • Batch size: 6 per device
  • Gradient accumulation steps: 4
  • Learning rate: 2e-4
  • Maximum steps: 20,000
  • Warmup steps: 25
  • Optimizer: AdamW
  • Evaluation strategy: Every 20000 steps

Hardware Specifications

  • GPU training with gradient checkpointing
  • Parallel data loading with 8 workers

I'll help you add the evaluation information to your markdown file in a clearer tabular format.

Here's how you can structure the evaluation section:

Evaluation

ROUGE Score Comparison

Metric Base Model Finetuned Model Improvement
ROUGE-1 2.85 4.67 +1.82
ROUGE-2 0.25 0.41 +0.17
ROUGE-L 2.84 4.65 +1.81

Model Prediction Comparison using Bigger model for evaluation

Category Count Percentage
Total samples 5962 100%
Same predictions 1 0.02%
Better predictions 4697 78.78%
Worse predictions 1264 21.20%

Evaluation Methods

  • ROUGE scores for headline similarity
  • Human evaluation for headline appropriateness

Inference

Running the model on a GPU using different precisions

  • Using torch.float16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16")

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
  • Using torch.bfloat16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Quantized Versions through bitsandbytes

  • Using 8-bit precision (int8)
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
  • Using 4-bit precision
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Other optimizations

  • Flash Attention 2

First make sure to install flash-attn in your environment pip install flash-attn

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
+   attn_implementation="flash_attention_2"
).to(0)

Inputs and outputs

  • Input: Text string, such as a question, a prompt, or a document to be summarized.
  • Output: Generated English-language text in response to the input, such as an answer to a question, or a summary of a document.

Technical Specifications

Model Architecture and Objective

  • Base architecture: Gemma-2
  • Training objective: Supervised fine-tuning for headline generation
  • Gradient checkpointing enabled for memory efficiency
  • Optimized data loading with pinned memory

Software

  • PyTorch
  • Transformers library
  • TRL for supervised fine-tuning
  • CUDA for GPU acceleration

Uses

Direct Use

This model is designed for generating Telugu news headlines from article content. It can be used by:

  • News organizations for automated headline generation
  • Content creators working with Telugu news content
  • Researchers studying Telugu natural language generation

Out-of-Scope Use

  • The model should not be used for generating fake news or misleading headlines
  • Not suitable for non-Telugu content
  • Not designed for general text generation tasks
  • Should not be used for classification or other non-headline generation tasks

Bias, Risks, and Limitations

  • May reflect biases present in Telugu news media
  • Performance may vary based on news domain and writing style
  • Limited to the vocabulary and patterns present in the training data
  • May occasionally generate grammatically incorrect Telugu text
  • Could potentially generate sensationalized headlines

Recommendations

  • Use with human oversight for published content
  • Verify generated headlines for accuracy
  • Monitor output for potential biases
  • Implement content filtering for inappropriate generations
Downloads last month
18
Safetensors
Model size
2.61B params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for saidines12/telugu-news-headline-generation

Base model

google/gemma-2-2b
Finetuned
(135)
this model

Dataset used to train saidines12/telugu-news-headline-generation