Update README.md
Browse files
README.md
CHANGED
@@ -17,22 +17,19 @@ InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM.
|
|
17 |
|
18 |
It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
|
19 |
|
20 |
-
#
|
21 |
-
|
22 |
-
## Pretrained Weights
|
23 |
|
24 |
| model name | type | download | size |
|
25 |
| ----------------------- | ------- | ---------------------------------------------------------------------------------------------- | :-----: |
|
26 |
| InternViT-6B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
|
27 |
|
28 |
-
|
29 |
|
30 |
| model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
|
31 |
| ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
32 |
| InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
|
33 |
|
34 |
-
|
35 |
-
## Semantic Segmentation
|
36 |
|
37 |
| type | backbone | head | mIoU | config | download |
|
38 |
| --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
|
|
17 |
|
18 |
It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
|
19 |
|
20 |
+
# Pretrained Weights
|
|
|
|
|
21 |
|
22 |
| model name | type | download | size |
|
23 |
| ----------------------- | ------- | ---------------------------------------------------------------------------------------------- | :-----: |
|
24 |
| InternViT-6B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
|
25 |
|
26 |
+
# Linear-Probe Image Classification
|
27 |
|
28 |
| model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
|
29 |
| ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
30 |
| InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
|
31 |
|
32 |
+
# Semantic Segmentation
|
|
|
33 |
|
34 |
| type | backbone | head | mIoU | config | download |
|
35 |
| --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|