InternVL / README.md
czczup's picture
Update README.md
b800530
|
raw
history blame
6.62 kB
metadata
license: mit
metrics:
  - accuracy
  - mean_iou

Model Card for InternVL

This repository contains the PyTorch version of the InternVL model weights.

What is InternVL?

[Paper] [GitHub]

InternVL scales up the ViT to 6B parameters and aligns it with LLM.

It is the largest open-source vision/vision-language foundation model (14B) to date, achieving 32 state-of-the-art performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.

Pretrained Weights

model name type download size
InternViT-6B-224px pytorch 🤗 HF link 12 GB

Linear-Probe Image Classification

model name IN-1K IN-ReaL IN-V2 IN-A IN-R IN-Sketch download
InternViT-6B-224px 88.2 90.4 79.9 77.5 89.8 69.1 ckpt | log

Semantic Segmentation

type backbone head mIoU config download
few-shot (1/16) InternViT-6B Linear 46.5 config ckpt | log
few-shot (1/8) InternViT-6B Linear 50.0 config ckpt | log
few-shot (1/4) InternViT-6B Linear 53.3 config ckpt | log
few-shot (1/2) InternViT-6B Linear 55.8 config ckpt | log
few-shot (1/1) InternViT-6B Linear 57.2 config ckpt | log
linear probing InternViT-6B (frozen) Linear 47.2 config ckpt | log
head tuning InternViT-6B (frozen) UperNet 54.9 config ckpt | log
full tuning InternViT-6B UperNet 58.9 config ckpt | log