OpenGVLab
/

InternVL

Model card Files Files and versions Community

czczup commited on Dec 26, 2023

Commit

bc86907

•

1 Parent(s): 1d37d35

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ metrics:
 This repository contains the PyTorch version of the InternVL model weights.
-# What is InternVL?
 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\]
@@ -20,20 +20,20 @@ It is trained using web-scale, noisy image-text pairs. The data are all publicly
 It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
-# Pretrained Weights
 | model name              | type    | download                                                                                       |  size   |
 | ----------------------- | ------- | ---------------------------------------------------------------------------------------------- | :-----: |
 | InternViT-6B-224px      | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth)      |  12 GB  |
 | InternVL-C-13B-224px |   pytorch   | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/internvl_c_13b_224px.pth) | 25.4 GB |
-# Linear-Probe Image Classification (ImageNet Series)
 | model name         | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch |                                                                                                         download                                                                                                  |
 | ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | InternViT-6B-224px | 88.2  |  90.4   | 79.9  | 77.5 | 89.8 |   69.1    | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
-# Semantic Segmentation (ADE20K)
 | type            | backbone              |  head   | mIoU |                                                   config                                                   |                                                                                                                      download                                                                                                                       |
 | --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
@@ -46,10 +46,10 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
 | head tuning     | InternViT-6B (frozen) | UperNet | 54.9 |  [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/head_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py)  | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
 | full tuning     | InternViT-6B          | UperNet | 58.9 |     [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/full_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py)      |        [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.log)        |
-# License
 This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.
-# Citation
 If you find this project useful in your research, please consider cite:
@@ -62,7 +62,7 @@ If you find this project useful in your research, please consider cite:
 }
 ```
-# Acknowledgement
 InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!

 This repository contains the PyTorch version of the InternVL model weights.
+## What is InternVL?
 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\]
 It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
+## Pretrained Weights
 | model name              | type    | download                                                                                       |  size   |
 | ----------------------- | ------- | ---------------------------------------------------------------------------------------------- | :-----: |
 | InternViT-6B-224px      | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth)      |  12 GB  |
 | InternVL-C-13B-224px |   pytorch   | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/internvl_c_13b_224px.pth) | 25.4 GB |
+## Linear-Probe Image Classification (ImageNet Series)
 | model name         | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch |                                                                                                         download                                                                                                  |
 | ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | InternViT-6B-224px | 88.2  |  90.4   | 79.9  | 77.5 | 89.8 |   69.1    | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
+## Semantic Segmentation (ADE20K)
 | type            | backbone              |  head   | mIoU |                                                   config                                                   |                                                                                                                      download                                                                                                                       |
 | --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | head tuning     | InternViT-6B (frozen) | UperNet | 54.9 |  [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/head_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py)  | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
 | full tuning     | InternViT-6B          | UperNet | 58.9 |     [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/full_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py)      |        [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.log)        |
+## License
 This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.
+## Citation
 If you find this project useful in your research, please consider cite:
 }
 ```
+## Acknowledgement
 InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!