Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ pipeline_tag: image-feature-extraction
|
|
17 |
|
18 |
\[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\] \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
|
19 |
|
20 |
-
We developed InternViT-300M-448px by
|
21 |
|
22 |
## Model Details
|
23 |
- **Model Type:** vision foundation model, feature backbone
|
|
|
17 |
|
18 |
\[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\] \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
|
19 |
|
20 |
+
This update primarily focuses on enhancing the efficiency of the vision foundation model. We developed InternViT-300M-448px by distilling knowledge from the robust vision foundation model, [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5). Like its predecessor, InternViT-300M-448px features a dynamic input resolution of 448×448, with a basic tile size of 448×448. During training, it allows for 1 to 12 tiles, and expands to 1 to 40 tiles during testing. Additionally, it inherits the powerful robustness, OCR capability, and high-resolution processing capacity from InternViT-6B-448px-V1-5.
|
21 |
|
22 |
## Model Details
|
23 |
- **Model Type:** vision foundation model, feature backbone
|