openflamingo-9b-hf / README.md
luodian's picture
Update README.md
573f01d
|
raw
history blame
2.67 kB
metadata
license: mit
language:
  - en
library_name: transformers

OpenFlamingo-9B-HF

This is a Huggingface version of OpenFlamingo. In order to make the model support Huggingface's accelerator feature, we wrapped the original OpenFlamingo into a Huggingface model.

You can find detailed descriptions in the luodian/otter.

Current OF-9B-HF model supports training with fully-sharded mechanism and loading to consumer GPUs (e.g. 3090-24G).

The following is original OpenFlamingo's model description.

Blog post | Code | Demo

OpenFlamingo is an open source implementation of DeepMind's Flamingo models. OpenFlamingo-9B is built off of CLIP ViT-L/14 and LLaMA-7B. Before using this model please familiarize yourself with our terms and conditions.

Model Details

We freeze the pretrained vision encoder and language model, and then we train connecting Perceiver modules and cross-attention layers, following the original Flamingo paper.

Our training data is a mixture of LAION 2B and a large interleaved image-text dataset called Multimodal C4, which will be released soon.

The current model is an early checkpoint of an ongoing effort. This checkpoint has seen 5 million interleaved image-text examples from Multimodal C4.

Uses

OpenFlamingo-9B is intended to be used for academic research purposes only. Commercial use is prohibited, in line with LLaMA's non-commercial license.

Bias, Risks, and Limitations

This model may generate inaccurate or offensive outputs, reflecting biases in its training data and pretrained priors.

In an effort to mitigate current potential biases and harms, we have deployed a content filter on model outputs in the OpenFlamingo demo. We continue to red-team the model to understand and improve its safety.

Evaluation

We've evaluated this checkpoint and report validation performance for two vision-language tasks: COCO captioning and VQAv2. Results are displayed below.

COCO (CIDEr)

0-shot 4-shot 8-shot 16-shot 32-shot
65.52 74.28 79.26 81.84 84.52

VQAv2 (VQA accuracy)

0-shot 4-shot 8-shot 16-shot 32-shot
43.55 44.05 47.5 48.87 50.34