Model Card for Gemma-2-2B-it Telugu News Headline Generator
This model is a fine-tuned version of Google's Gemma-2-2B Instruction model, optimized for generating Telugu news headlines from article content. It has been trained using Supervised Fine-Tuning (SFT) on a Telugu news dataset.
Model Details
Model Description
- Developed by: Google (base model) with Telugu news fine-tuning
- Model type: Decoder-only transformer language model
- Language(s): Telugu
- License: Apache 2.0
- Finetuned from model: Gemma-2-2B
Model Sources
- Repository: Hugging Face Hub
- Base Model: google/gemma-2-2b-it
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation")
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
headline = tokenizer.decode(outputs[0], skip_special_tokens=True)
Training Details
Training Data
- Telugu news articles and headlines dataset
- Data cleaned and preprocessed for headline generation task
- Articles spanning various news categories
Training Procedure
Training Hyperparameters
- Training regime: FP16 mixed precision
- Batch size: 6 per device
- Gradient accumulation steps: 4
- Learning rate: 2e-4
- Maximum steps: 20,000
- Warmup steps: 25
- Optimizer: AdamW
- Evaluation strategy: Every 20000 steps
Hardware Specifications
- GPU training with gradient checkpointing
- Parallel data loading with 8 workers
I'll help you add the evaluation information to your markdown file in a clearer tabular format.
Here's how you can structure the evaluation section:
Evaluation
ROUGE Score Comparison
Metric | Base Model | Finetuned Model | Improvement |
---|---|---|---|
ROUGE-1 | 2.85 | 4.67 | +1.82 |
ROUGE-2 | 0.25 | 0.41 | +0.17 |
ROUGE-L | 2.84 | 4.65 | +1.81 |
Model Prediction Comparison using Bigger model for evaluation
Category | Count | Percentage |
---|---|---|
Total samples | 5962 | 100% |
Same predictions | 1 | 0.02% |
Better predictions | 4697 | 78.78% |
Worse predictions | 1264 | 21.20% |
Evaluation Methods
- ROUGE scores for headline similarity
- Human evaluation for headline appropriateness
Inference
Running the model on a GPU using different precisions
- Using
torch.float16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", revision="float16")
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
- Using
torch.bfloat16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", device_map="auto", torch_dtype=torch.bfloat16)
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Quantized Versions through bitsandbytes
- Using 8-bit precision (int8)
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
- Using 4-bit precision
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("saidines12/telugu-news-headline-generation")
model = AutoModelForCausalLM.from_pretrained("saidines12/telugu-news-headline-generation", quantization_config=quantization_config)
input_text = "Generate relevant, interesting, factual short headline from this news article in telugu language\n <Your Telugu news article text here>"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Other optimizations
- Flash Attention 2
First make sure to install flash-attn
in your environment pip install flash-attn
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
+ attn_implementation="flash_attention_2"
).to(0)
Inputs and outputs
- Input: Text string, such as a question, a prompt, or a document to be summarized.
- Output: Generated English-language text in response to the input, such as an answer to a question, or a summary of a document.
Technical Specifications
Model Architecture and Objective
- Base architecture: Gemma-2
- Training objective: Supervised fine-tuning for headline generation
- Gradient checkpointing enabled for memory efficiency
- Optimized data loading with pinned memory
Software
- PyTorch
- Transformers library
- TRL for supervised fine-tuning
- CUDA for GPU acceleration
Uses
Direct Use
This model is designed for generating Telugu news headlines from article content. It can be used by:
- News organizations for automated headline generation
- Content creators working with Telugu news content
- Researchers studying Telugu natural language generation
Out-of-Scope Use
- The model should not be used for generating fake news or misleading headlines
- Not suitable for non-Telugu content
- Not designed for general text generation tasks
- Should not be used for classification or other non-headline generation tasks
Bias, Risks, and Limitations
- May reflect biases present in Telugu news media
- Performance may vary based on news domain and writing style
- Limited to the vocabulary and patterns present in the training data
- May occasionally generate grammatically incorrect Telugu text
- Could potentially generate sensationalized headlines
Recommendations
- Use with human oversight for published content
- Verify generated headlines for accuracy
- Monitor output for potential biases
- Implement content filtering for inappropriate generations
- Downloads last month
- 18