czczup commited on
Commit
c83e8eb
1 Parent(s): 597ddaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -28,7 +28,7 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
28
  - Params (M): 5540 (the last 3 blocks are discarded)
29
  - Image size: 448 x 448
30
  - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR data
31
- - **Note:** InternViT-6B originally had 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. For ease of use and to save GPU memory, we simply discarded the last 3 blocks. Now, the model has only 45 blocks and the number of parameters has been reduced from 5.9B to 5.5B. Therefore, if you want to build a VLLM based on this model, **please use the last layer of features.**
32
  ## Model Usage (Image Embeddings)
33
 
34
  ```python
 
28
  - Params (M): 5540 (the last 3 blocks are discarded)
29
  - Image size: 448 x 448
30
  - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR data
31
+ - **Note:** InternViT-6B originally had 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. For ease of use and to save GPU memory, we simply discarded the last 3 blocks. Now, the model has only 45 blocks and the number of parameters has been reduced from 5.9B to 5.5B. Therefore, if you want to build a VLLM based on this model, **please make use of the features from the last layer.**
32
  ## Model Usage (Image Embeddings)
33
 
34
  ```python