GPT2 Instruction Tuned English To German Headline Translation Model
- This model makes use of a english to german news headline translation dataset derived from Harvard/abc-news-dataset for the task of instruction tuning
- The dataset was derived using LLaMA3.1 and GPT4o models for generating the translations
- This model is a fine-tuned version of raghavbali/gpt2-finetuned-headliner.
Model description
This model leverages a Stanford Alpaca style instruction tuning dataset, the format is as follows:
###Translate English Text to German:{text} ###Output: {translated_text}
The format is slightly modified to reduce the additional tokens required for the instructions as GPT2 context size is very limited. The model is trained on small ~5k sample to showcase the impact of instruction tuning on overall alignment of the model towards requested task
Intended uses & limitations
This is only for learning purposes. The model seems to have picked up German vocabulary as well as sentence structures to a good extent but the actual translations are at time grossly incorrect. The model also attempts at completing the news headlines given as prompt and has a high tendency to hallucinate.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 4
- num_epochs: 1
Training results
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 31
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for raghavbali/gpt2-instruct-tuned-translator2
Base model
openai-community/gpt2-medium
Finetuned
raghavbali/gpt2-finetuned-headliner