StarCycle commited on
Commit
77942d1
β€’
1 Parent(s): 79e43c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -0
README.md CHANGED
@@ -1,3 +1,138 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - llava
5
+ pipeline_tag: image-text-to-text
6
  ---
7
+ ## Model
8
+ llava-siglip-internlm2-1_8b-pretrain-v1 is a LLaVA checkpoint finetuned from [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) with [LLaVA-Pretrain](liuhaotian/LLaVA-Pretrain) by [Xtuner](https://github.com/InternLM/xtuner). The pretraining phase took 5.5 hours on 4 Nvidia GTX 4090 GPU (see this [intermediate checkpoint](https://huggingface.co/StarCycle/llava-siglip-internlm2-1_8b-pretrain-v1)).
9
+
10
+ The total size of the model is around 2.2B, which is suitable for embedded applications like robotics.
11
+
12
+ #### I just finished the pretrain phase of the model. I will release the full finetuned model soon. You can also finetune your own version based on the checkpoint here.
13
+
14
+ ## Installation
15
+ ```
16
+ # We need the newest version so clone from github
17
+ git clone https://github.com/huggingface/transformers/
18
+ git clone https://github.com/huggingface/peft
19
+ git clone https://github.com/InternLM/xtuner
20
+ ```
21
+ Now please replace the files in transformers and xtuner with the source code files in modified_transformers and modified_xtuner.
22
+
23
+ Then run
24
+ ```
25
+ pip install -e ./xtuner[deepspeed]
26
+ apt install git-lfs
27
+ ```
28
+
29
+ ## Common Errors
30
+ 1.
31
+ ```
32
+ command error: 'libGL.so.1: cannot open shared object file: No such file or directory'!
33
+ ```
34
+ You can solve it by
35
+ ```
36
+ # For Ubuntu
37
+ sudo apt-get update
38
+ sudo apt-get install libgl1-mesa-glx
39
+
40
+ # For CentOS and Fedora
41
+ sudo yum install mesa-libGL
42
+ ```
43
+
44
+ 2.
45
+ ```
46
+ Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
47
+ Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
48
+ ```
49
+ You can solve it by reinstall numpy.
50
+
51
+ 3.
52
+ ```
53
+ ImportError:
54
+ InternLM2Converter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
55
+ ```
56
+ You just need
57
+ ```
58
+ pip install protobuf
59
+ ```
60
+ 4.
61
+ To use tensorboard to visualize the training loss curve:
62
+ ```
63
+ pip install future tensorboard
64
+ ```
65
+
66
+ 5. If your training process is killed during data preprocessing, you can modify the `map_num_proc` in xtuner/xtuner/dataset
67
+ /huggingface.py
68
+ ```
69
+ def process(dataset,
70
+ do_dataset_tokenization=True,
71
+ tokenizer=None,
72
+ max_length=None,
73
+ dataset_map_fn=None,
74
+ template_map_fn=None,
75
+ max_dataset_length=None,
76
+ split='train',
77
+ remove_unused_columns=False,
78
+ rename_maps=[],
79
+ shuffle_before_pack=True,
80
+ pack_to_max_length=True,
81
+ use_varlen_attn=False,
82
+ input_ids_with_output=True,
83
+ with_image_token=False,
84
+ map_num_proc=32): # modify it to a smaller number, e.g., 4
85
+ ```
86
+
87
+ 6. If you fail to load the model, check whether you installed git-lfs and actually downloaded the model file.
88
+
89
+ ## Data prepration
90
+ 1. File structure
91
+
92
+ ```
93
+ # . means the llava-clip-internlm2-1_8b-pretrain-v1 folder you clone
94
+ ./data/llava_data
95
+ β”œβ”€β”€ LLaVA-Pretrain
96
+ Β Β  β”œβ”€β”€ blip_laion_cc_sbu_558k.json
97
+ Β Β  β”œβ”€β”€ blip_laion_cc_sbu_558k_meta.json
98
+ Β Β  └── images
99
+
100
+ ```
101
+
102
+ 2. Pretrain Data
103
+
104
+ LLaVA-Pretrain
105
+
106
+ ```shell
107
+ # Make sure you have git-lfs installed (https://git-lfs.com)
108
+ git lfs install
109
+ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
110
+ ```
111
+
112
+ 3. Finetune Data
113
+
114
+ Please check the final release version
115
+
116
+ ## Cheers! Now train your own model!
117
+ 1. Alignment module pretraining
118
+ ```
119
+ # single GPU
120
+ xtuner train ./pretrain.py --deepspeed deepspeed_zero2
121
+
122
+ # multiple GPU
123
+ NPROC_PER_NODE=4 xtuner train ./pretrain.py --deepspeed deepspeed_zero2
124
+ ```
125
+
126
+ #### Remember to change the batch size and gradient accumulation parameters to fit your hardware. So your GPU_num * batch_size * gradient_accumulation is roughly equal to mine to reproduce the result.
127
+
128
+ The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
129
+
130
+ This is my loss curve for llava-siglip-internlm2-1_8b-pretrain-v1:
131
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/geoWP80yE5wzG1e6ZJTEy.png)
132
+
133
+ And the learning rate curve:
134
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/hy8ulNnvy1Y7fE1ZNnHRN.png)
135
+
136
+ 2. Instruction following fine-tuning
137
+
138
+ Please check the final release version