metadata
license: mit
metrics:
- accuracy
- mean_iou
Model Card for InternVL
This repository contains the PyTorch version of the InternVL model weights.
What is InternVL?
InternVL scales up the ViT to 6B parameters and aligns it with LLM.
It is the largest open-source vision/vision-language foundation model (14B) to date, achieving 32 state-of-the-art performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
Pretrained Weights
model name | type | download | size |
---|---|---|---|
InternViT-6B-224px | pytorch | 🤗 HF link | 12 GB |
Linear-Probe Image Classification
model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
---|---|---|---|---|---|---|---|
InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | ckpt | log |
Semantic Segmentation
type | backbone | head | mIoU | config | download |
---|---|---|---|---|---|
few-shot (1/16) | InternViT-6B | Linear | 46.5 | config | ckpt | log |
few-shot (1/8) | InternViT-6B | Linear | 50.0 | config | ckpt | log |
few-shot (1/4) | InternViT-6B | Linear | 53.3 | config | ckpt | log |
few-shot (1/2) | InternViT-6B | Linear | 55.8 | config | ckpt | log |
few-shot (1/1) | InternViT-6B | Linear | 57.2 | config | ckpt | log |
linear probing | InternViT-6B (frozen) | Linear | 47.2 | config | ckpt | log |
head tuning | InternViT-6B (frozen) | UperNet | 54.9 | config | ckpt | log |
full tuning | InternViT-6B | UperNet | 58.9 | config | ckpt | log |