distilvit
This model is a work in progress. Fine-tuned version of those base models:
- a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
- a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
This model was trained on:
- Flickr30k : https://huggingface.co/datasets/nlphuji/flickr30k
- COCO 2017: https://cocodataset.org
You can get that checkpoint using the 3083a3cef6e3c8dd90df3f088074bbe836b0f403 commit.
It was then further fine-tuned on :
- Flickr30k debiased: https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions
- DocOrNot: https://huggingface.co/datasets/Mozilla/docornot
You can find the code used to create the model here: https://github.com/mozilla/distilvit
Framework versions
- Transformers 4.40.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 7
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tarekziade/test-push
Base model
google/vit-base-patch16-224-in21kDataset used to train tarekziade/test-push
Evaluation results
- ROUGE-1 on nlphuji/flickr30kself-reported43.006
- ROUGE-2 on nlphuji/flickr30kself-reported16.994
- ROUGE-L on nlphuji/flickr30kself-reported38.892
- ROUGE-LSUM on nlphuji/flickr30kself-reported38.888
- loss on nlphuji/flickr30kself-reported0.199
- gen_len on nlphuji/flickr30kself-reported11.327