Model Card for Gemma-2-2B-it Telugu News Headline Generator

This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.

Model Details

Model Description

Developed by: Google (base model) with Telugu news fine-tuning
Model type: Decoder-only transformer language model
Language(s): Telugu
License: Apache 2.0
Finetuned from model: Gemma-2-2B

Model Sources

Repository: Hugging Face Hub
Base Model: google/gemma-2-2b-it

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation")
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")

text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
headline = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Details

Training Data

Telugu news articles and headlines dataset
Data cleaned and preprocessed for headline generation task
Articles spanning various news categories

Training Procedure

Training Hyperparameters

Training regime: FP16 mixed precision
Batch size: 6 per device
Gradient accumulation steps: 4
Learning rate: 2e-4
Maximum steps: 20,000
Warmup steps: 25
Optimizer: AdamW
Evaluation strategy: Every 20000 steps

Hardware Specifications

GPU training with gradient checkpointing
Parallel data loading with 8 workers

I'll help you add the evaluation information to your markdown file in a clearer tabular format.

Here's how you can structure the evaluation section:

Evaluation

ROUGE Score Comparison

Metric	Base Model	Finetuned Model	Improvement
ROUGE-1	2.85	4.67	+1.82
ROUGE-2	0.25	0.41	+0.17
ROUGE-L	2.84	4.65	+1.81

Model Prediction Comparison using Bigger model for evaluation

Category	Count	Percentage
Total samples	5962	100%
Same predictions	1	0.02%
Better predictions	4697	78.78%
Worse predictions	1264	21.20%

Evaluation Methods

ROUGE scores for headline similarity
Human evaluation for headline appropriateness

Inference

Running the model on a GPU using different precisions

Using torch.float16

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16")

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Using torch.bfloat16

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Quantized Versions through `bitsandbytes`

Using 8-bit precision (int8)

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Using 4-bit precision

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)

input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Other optimizations

Flash Attention 2

First make sure to install flash-attn in your environment pip install flash-attn

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
+   attn_implementation="flash_attention_2"
).to(0)

Inputs and outputs

Input: Text string, such as a question, a prompt, or a document to be summarized.
Output: Generated English-language text in response to the input, such as an answer to a question, or a summary of a document.

Technical Specifications

Model Architecture and Objective

Base architecture: Gemma-2
Training objective: Supervised fine-tuning for headline generation
Gradient checkpointing enabled for memory efficiency
Optimized data loading with pinned memory

Software

PyTorch
Transformers library
TRL for supervised fine-tuning
CUDA for GPU acceleration

Uses

Direct Use

This model is designed for generating Telugu news headlines from article content. It can be used by:

News organizations for automated headline generation
Content creators working with Telugu news content
Researchers studying Telugu natural language generation

Out-of-Scope Use

The model should not be used for generating fake news or misleading headlines
Not suitable for non-Telugu content
Not designed for general text generation tasks
Should not be used for classification or other non-headline generation tasks

Bias, Risks, and Limitations

May reflect biases present in Telugu news media
Performance may vary based on news domain and writing style
Limited to the vocabulary and patterns present in the training data
May occasionally generate grammatically incorrect Telugu text
Could potentially generate sensationalized headlines

Recommendations

Use with human oversight for published content
Verify generated headlines for accuracy
Monitor output for potential biases
Implement content filtering for inappropriate generations

saidines12
/

telugu-news-headline-generation

Model Card for Gemma-2-2B-it Telugu News Headline Generator

Model Details

Model Description

Model Sources

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Hardware Specifications

Evaluation

ROUGE Score Comparison

Model Prediction Comparison using Bigger model for evaluation

Evaluation Methods

Inference

Running the model on a GPU using different precisions

Quantized Versions through `bitsandbytes`

Other optimizations

Inputs and outputs

Technical Specifications

Model Architecture and Objective

Software

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Model tree for saidines12/telugu-news-headline-generation

Dataset used to train saidines12/telugu-news-headline-generation

Model Card for Gemma-2-2B-it Telugu News Headline Generator

Model Details

Model Description

Model Sources

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Hardware Specifications

Evaluation

ROUGE Score Comparison

Model Prediction Comparison using Bigger model for evaluation

Evaluation Methods

Inference

Running the model on a GPU using different precisions

Quantized Versions through bitsandbytes

Other optimizations

Inputs and outputs

Technical Specifications

Model Architecture and Objective

Software

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Model tree for saidines12/telugu-news-headline-generation

Dataset used to train saidines12/telugu-news-headline-generation

Quantized Versions through `bitsandbytes`