Update README.md
Browse files
README.md
CHANGED
@@ -84,7 +84,7 @@ The hyperparameters used for finetuning are listed in the following table.
|
|
84 |
- **Training Strategy:**
|
85 |
- Pretraining Stage
|
86 |
- Learnable Component: MLP
|
87 |
-
- Data: Trained on 8192x4800=39.3M samples, including COYO, LAION, CC12M, CC3M, SBU, Wukong, GRIT, Objects365, OpenImages, and OCR
|
88 |
- Note: In this stage, we load the pretrained weights of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
|
89 |
- Supervised Finetuning Stage
|
90 |
- Learnable Component: ViT + MLP + LLM
|
|
|
84 |
- **Training Strategy:**
|
85 |
- Pretraining Stage
|
86 |
- Learnable Component: MLP
|
87 |
+
- Data: Trained on 8192x4800=39.3M samples, including COYO, LAION, CC12M, CC3M, SBU, Wukong, GRIT, Objects365, OpenImages, and OCR-related datasets.
|
88 |
- Note: In this stage, we load the pretrained weights of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
|
89 |
- Supervised Finetuning Stage
|
90 |
- Learnable Component: ViT + MLP + LLM
|