metadata

license: mit
metrics:
  - accuracy
  - mean_iou

Model Card for InternVL

This repository contains the PyTorch version of the InternVL model weights.

What is InternVL?

[Paper] [GitHub]

InternVL scales up the ViT to 6B parameters and aligns it with LLM.

It is the largest open-source vision/vision-language foundation model (14B) to date, achieving 32 state-of-the-art performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.

Pretrained Weights

model name	type	download	size
InternViT-6B-224px	pytorch	🤗 HF link	12 GB

Linear-Probe Image Classification

model name	IN-1K	IN-ReaL	IN-V2	IN-A	IN-R	IN-Sketch	download
InternViT-6B-224px	88.2	90.4	79.9	77.5	89.8	69.1	ckpt \| log

Semantic Segmentation

type	backbone	head	mIoU	config	download
few-shot (1/16)	InternViT-6B	Linear	46.5	config	ckpt \| log
few-shot (1/8)	InternViT-6B	Linear	50.0	config	ckpt \| log
few-shot (1/4)	InternViT-6B	Linear	53.3	config	ckpt \| log
few-shot (1/2)	InternViT-6B	Linear	55.8	config	ckpt \| log
few-shot (1/1)	InternViT-6B	Linear	57.2	config	ckpt \| log
linear probing	InternViT-6B (frozen)	Linear	47.2	config	ckpt \| log
head tuning	InternViT-6B (frozen)	UperNet	54.9	config	ckpt \| log
full tuning	InternViT-6B	UperNet	58.9	config	ckpt \| log