https://github.com/baaivision/EVA/tree/master/EVA-CLIP

Summary of EVA-CLIP performance

summary_tab

Model Card

EVA-01-CLIP Series (MIM teacher: OpenAI CLIP-Large)

model name total #params training precision training data training batch size gpus for training IN-1K zero-shot top-1 MSCOCO T2I R@5 weight
EVA01_CLIP_g_14_psz14_s11B 1.1B fp16 LAION-400M 41K 256 A100(40GB) 78.5 68.5 πŸ€— HF link (2.2GB)
EVA01_CLIP_g_14_plus_psz14_s11B 1.3B fp16 Merged-2B 114K 112 A100(40GB) 79.3 74.0 πŸ€— HF link (2.7GB)

EVA-02-CLIP Series (MIM teacher: EVA01_CLIP_g_14_psz14_s11B)

model name image enc. init. ckpt text enc. init. ckpt total #params training precision training data training batch size gpus for training IN-1K zero-shot top-1 MSCOCO T2I R@5 weight
EVA02_CLIP_B_psz16_s8B EVA02_B_psz14to16 openai/clip-vit-base-patch16 149M fp16 Merged-2B 131K 64 A100(40GB) 74.7 66.9 πŸ€— HF link (300MB)
EVA02_CLIP_L_psz14_s4B EVA02_L_psz14 openai/clip-vit-large-patch14 428M fp16 Merged-2B 131K 128 A100(40GB) 79.8 71.2 πŸ€— HF link (856MB)
EVA02_CLIP_L_336_psz14_s6B EVA02_CLIP_L_psz14_224to336 EVA02_CLIP_L_psz14_224to336 428M fp16 Merged-2B 61K 128 A100(40GB) 80.4 71.7 πŸ€— HF link (856MB)
EVA02_CLIP_E_psz14_s4B.pt EVA02_E_psz14 laion/CLIP-ViT-H-14-laion2B-s32B-b79K 4.7B fp16 LAION-2B 115K 144 A100(80GB) 81.9 74.7 πŸ€— HF link (9.4GB)
EVA02_CLIP_E_psz14_plus_s9B.pt EVA02_E_psz14 laion/CLIP-ViT-bigG-14-laion2B-39B-b160k 5.0B bf16 LAION-2B 144K 144 A100(80GB) 82.0 75.0 πŸ€— HF link (10.1GB)
  • To construct Merged-2B, we merged 1.6 billion samples from LAION-2B dataset with 0.4 billion samples from COYO-700M.

  • To our knowledge, EVA-CLIP series are the most performant open-sourced CLIP models at all scales, evaluated via zero-shot classification performance, especially on mainstream classification benchmarks such as ImageNet along with its variants. For more details about EVA-CLIP, please refer to our paper.

Pretrained

model name total #params training precision download link
EVA01_g_psz14 1.0B fp16 πŸ€— HF link (2.0GB)
EVA02_B_psz14to16 86M fp16 πŸ€— HF link (176MB)
EVA02_L_psz14 304M fp16 πŸ€— HF link (609MB)
EVA02_CLIP_L_psz14_224to336 428M fp16 πŸ€— HF link (857MB)
EVA02_E_psz14 4.4B fp16 πŸ€— HF link (8.7GB)
openai/clip-vit-base-patch16 149M fp16 πŸ€— HF link (599MB)
openai/clip-vit-large-patch14 428M fp16 πŸ€— HF link (1.7GB)
laion/CLIP-ViT-H-14-laion2B-s32B-b79K 1.0B bf16 πŸ€— HF link (3.9GB)
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k 1.8B bf16 πŸ€— HF link part1 part2(9.9GB+169M)
  • EVA02_B_psz14to16 interpolates the kernel size of patch_embed from 14x14 to 16x16, and interpolate the pos_embed from 16x16 to 14x14.

  • EVA02_CLIP_L_psz14_224to336 interpolates the pos_embed from 16x16 to 24x24 for training EVA02_CLIP_L_336_psz14_s6B.

  • laion/CLIP-ViT-bigG-14-laion2B-39B-b160k consists of 2 parts of weights, part1 and part2.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Spaces using QuanSun/EVA-CLIP 43