OpenNLPLab
commited on
Commit
·
3b65586
1
Parent(s):
c5d3093
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,191 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
tags:
|
7 |
+
- HGRN
|
8 |
+
- Recurrent Neural Network
|
9 |
---
|
10 |
+
|
11 |
+
- [HGRN](#hgrn)
|
12 |
+
- [Overall Architecture](#overall-architecture)
|
13 |
+
- [Experiments](#experiments)
|
14 |
+
- [Environment Preparation](#environment-preparation)
|
15 |
+
- [Env1](#env1)
|
16 |
+
- [Env2](#env2)
|
17 |
+
- [Autoregressive language model](#autoregressive-language-model)
|
18 |
+
- [1) Preprocess the data](#1-preprocess-the-data)
|
19 |
+
- [2) Train the autoregressive language model](#2-train-the-autoregressive-language-model)
|
20 |
+
- [Image modeling](#image-modeling)
|
21 |
+
- [LRA](#lra)
|
22 |
+
- [1) Preparation](#1-preparation)
|
23 |
+
- [2) Training](#2-training)
|
24 |
+
- [Standalone code](#standalone-code)
|
25 |
+
|
26 |
+
|
27 |
+
## Overall Architecture
|
28 |
+
|
29 |
+
The overall network architecture is as follows:
|
30 |
+
|
31 |
+
<div align="center"> <img src="./hgrn.png" width = "100%" height = "100%" alt="network" align=center /></div>
|
32 |
+
|
33 |
+
|
34 |
+
## Experiments
|
35 |
+
|
36 |
+
### Environment Preparation
|
37 |
+
|
38 |
+
Our experiment uses two conda environments, where Autoregressive language modeling, needs to configure the environment according to the Env1 part, and LRA needs to configure the environment according to the Env2 part.
|
39 |
+
|
40 |
+
#### Env1
|
41 |
+
|
42 |
+
First build the conda environment based on the yaml file:
|
43 |
+
|
44 |
+
```
|
45 |
+
conda env create --file env1.yaml
|
46 |
+
```
|
47 |
+
|
48 |
+
If you meet an error when installing torch, just remove torch and torchvision in the yaml file, rerun the above command, and then run the below commands:
|
49 |
+
|
50 |
+
```
|
51 |
+
conda activate hgrn
|
52 |
+
wget https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp36-cp36m-linux_x86_64.whl
|
53 |
+
pip install torch-1.8.1+cu111-cp36-cp36m-linux_x86_64.whl
|
54 |
+
pip install -r requirements_hgrn.txt
|
55 |
+
```
|
56 |
+
|
57 |
+
Then, install `hgru-pytorch`:
|
58 |
+
```
|
59 |
+
conda activate hgrn
|
60 |
+
cd hgru-pytorch
|
61 |
+
pip install .
|
62 |
+
```
|
63 |
+
|
64 |
+
Finally, install our version of fairseq:
|
65 |
+
|
66 |
+
```
|
67 |
+
cd fairseq
|
68 |
+
pip install --editable ./
|
69 |
+
```
|
70 |
+
|
71 |
+
|
72 |
+
|
73 |
+
#### Env2
|
74 |
+
|
75 |
+
Build the conda environment based on the yaml file:
|
76 |
+
|
77 |
+
```
|
78 |
+
conda env create --file env2.yaml
|
79 |
+
```
|
80 |
+
|
81 |
+
If you encounter difficulties in setting up the environment, you can install the conda environment first, and then use the following command to install the pip packages:
|
82 |
+
```
|
83 |
+
pip install torch==1.10.0+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
|
84 |
+
pip install -r requirements_lra.txt
|
85 |
+
```
|
86 |
+
|
87 |
+
Finally, install `hgru-pytorch`:
|
88 |
+
```
|
89 |
+
conda activate lra
|
90 |
+
cd hgru-pytorch
|
91 |
+
pip install .
|
92 |
+
```
|
93 |
+
|
94 |
+
|
95 |
+
### Autoregressive language model
|
96 |
+
|
97 |
+
#### 1) Preprocess the data
|
98 |
+
|
99 |
+
First download the [WikiText-103 dataset](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/):
|
100 |
+
|
101 |
+
```
|
102 |
+
wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
|
103 |
+
unzip wikitext-103-raw-v1.zip
|
104 |
+
```
|
105 |
+
|
106 |
+
Next, encode it with the GPT-2 BPE:
|
107 |
+
|
108 |
+
```
|
109 |
+
mkdir -p gpt2_bpe
|
110 |
+
wget -O gpt2_bpe/encoder.json https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
|
111 |
+
wget -O gpt2_bpe/vocab.bpe https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
|
112 |
+
for SPLIT in train valid test; do \
|
113 |
+
python -m examples.roberta.multiprocessing_bpe_encoder \
|
114 |
+
--encoder-json gpt2_bpe/encoder.json \
|
115 |
+
--vocab-bpe gpt2_bpe/vocab.bpe \
|
116 |
+
--inputs wikitext-103-raw/wiki.${SPLIT}.raw \
|
117 |
+
--outputs wikitext-103-raw/wiki.${SPLIT}.bpe \
|
118 |
+
--keep-empty \
|
119 |
+
--workers 60; \
|
120 |
+
done
|
121 |
+
```
|
122 |
+
|
123 |
+
Finally, preprocess/binarize the data using the GPT-2 fairseq dictionary:
|
124 |
+
|
125 |
+
```
|
126 |
+
wget -O gpt2_bpe/dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
|
127 |
+
fairseq-preprocess \
|
128 |
+
--only-source \
|
129 |
+
--srcdict gpt2_bpe/dict.txt \
|
130 |
+
--trainpref wikitext-103-raw/wiki.train.bpe \
|
131 |
+
--validpref wikitext-103-raw/wiki.valid.bpe \
|
132 |
+
--testpref wikitext-103-raw/wiki.test.bpe \
|
133 |
+
--destdir data-bin/wikitext-103 \
|
134 |
+
--workers 60
|
135 |
+
```
|
136 |
+
|
137 |
+
This step comes from [fairseq](https://github.com/facebookresearch/fairseq/blob/main/examples/roberta/README.pretraining.md).
|
138 |
+
|
139 |
+
|
140 |
+
|
141 |
+
|
142 |
+
#### 2) Train the autoregressive language model
|
143 |
+
|
144 |
+
Use the following command to train language model:
|
145 |
+
|
146 |
+
```
|
147 |
+
bash script_alm.sh
|
148 |
+
```
|
149 |
+
|
150 |
+
You should change data_dir to preprocessed data.
|
151 |
+
|
152 |
+
|
153 |
+
|
154 |
+
### Image modeling
|
155 |
+
|
156 |
+
```
|
157 |
+
bash script_im.sh
|
158 |
+
```
|
159 |
+
|
160 |
+
|
161 |
+
### LRA
|
162 |
+
|
163 |
+
#### 1) Preparation
|
164 |
+
|
165 |
+
Download the codebase:
|
166 |
+
|
167 |
+
```
|
168 |
+
git clone https://github.com/OpenNLPLab/lra.git
|
169 |
+
```
|
170 |
+
|
171 |
+
Download the data:
|
172 |
+
|
173 |
+
```
|
174 |
+
wget https://storage.googleapis.com/long-range-arena/lra_release.gz
|
175 |
+
mv lra_release.gz lra_release.tar.gz
|
176 |
+
tar -xvf lra_release.tar.gz
|
177 |
+
```
|
178 |
+
|
179 |
+
|
180 |
+
#### 2) Training
|
181 |
+
|
182 |
+
Use the following script to run the experiments, you should change `PREFIX` to your lra path, change `tasks` to a specific task:
|
183 |
+
|
184 |
+
```
|
185 |
+
python script_lra.py
|
186 |
+
```
|
187 |
+
|
188 |
+
|
189 |
+
|
190 |
+
## Standalone code
|
191 |
+
See [hgru-pytorch](https://github.com/Doraemonzzz/hgru-pytorch).
|