huseinzol05 commited on
Commit
c2bdb69
1 Parent(s): 8f5b1d8
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ *.ipynb_checkpoints
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ms
3
+ ---
4
+
5
+ # t5-small-bahasa-cased
6
+
7
+ Pretrained T5 small language model for Malay.
8
+
9
+ ## Pretraining Corpus
10
+
11
+ `t5-small-bahasa-cased` model was pretrained on multiple tasks. Below is list of tasks we trained on,
12
+
13
+ 1. Language masking task on bahasa news, bahasa Wikipedia, bahasa Academia.edu, bahasa parliament and translated The Pile.
14
+ 2. News title prediction on bahasa news.
15
+ 3. Next sentence prediction on bahasa news, bahasa Wikipedia, bahasa Academia.edu, bahasa parliament and translated The Pile.
16
+ 4. Translated QA Natural.
17
+ 5. Text Similarity task on translated SNLI and translated MNLI.
18
+ 6. EN-MS translation.
19
+ 7. MS-EN translation.
20
+ 8. Abstractive Summarization.
21
+ 9. Knowledge Graph triples generation.
22
+ 10. Paraphrase.
23
+
24
+ Preparing steps can reproduce at https://github.com/huseinzol05/malaya/tree/master/pretrained-model/t5/prepare
25
+
26
+ ## Pretraining details
27
+
28
+ - This model was trained using Google T5 repository https://github.com/google-research/text-to-text-transfer-transformer, on v3-8 TPU.
29
+ - All steps can reproduce from here, https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/t5
30
+
31
+ ## Load Pretrained Model
32
+
33
+ You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:
34
+
35
+ ```python
36
+ from transformers import T5Tokenizer, T5Model
37
+
38
+ model = T5Model.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
39
+ tokenizer = T5Tokenizer.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
40
+ ```
41
+
42
+ ## Example using T5ForConditionalGeneration
43
+
44
+ ```python
45
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
46
+
47
+ tokenizer = T5Tokenizer.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
48
+ model = T5ForConditionalGeneration.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
49
+ input_ids = tokenizer.encode('soalan: siapakah perdana menteri malaysia?', return_tensors = 'pt')
50
+ outputs = model.generate(input_ids)
51
+ print(tokenizer.decode(outputs[0]))
52
+ ```
53
+
54
+ Output is,
55
+
56
+ ```
57
+ 'Mahathir Mohamad'
58
+ ```
59
+
60
+ ## Supported prefix
61
+
62
+ 1. `soalan: {string}`, trained using Natural QA.
63
+ 2. `ringkasan: {string}`, for abstractive summarization.
64
+ 3. `tajuk: {string}`, for abstractive title.
65
+ 4. `parafrasa: {string}`, for abstractive paraphrase.
66
+ 5. `terjemah Inggeris ke Melayu: {string}`, for EN-MS translation.
67
+ 6. `terjemah Melayu ke Inggeris: {string}`, for MS-EN translation.
68
+ 7. `grafik pengetahuan: {string}`, for MS text to EN Knowledge Graph triples format.
69
+ 8. `ayat1: {string1} ayat2: {string2}`, semantic similarity.
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./pytorch_model.bin",
3
+ "architectures": [
4
+ "T5Model"
5
+ ],
6
+ "d_ff": 2048,
7
+ "d_kv": 64,
8
+ "d_model": 512,
9
+ "decoder_start_token_id": 0,
10
+ "dropout_rate": 0.1,
11
+ "eos_token_id": 1,
12
+ "feed_forward_proj": "relu",
13
+ "gradient_checkpointing": false,
14
+ "initializer_factor": 1.0,
15
+ "inputs_length": 1024,
16
+ "is_encoder_decoder": true,
17
+ "layer_norm_epsilon": 1e-06,
18
+ "model_type": "t5",
19
+ "n_positions": 1024,
20
+ "num_decoder_layers": 6,
21
+ "num_heads": 8,
22
+ "num_layers": 6,
23
+ "pad_token_id": 0,
24
+ "relative_attention_num_buckets": 32,
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.10.0",
27
+ "use_cache": true,
28
+ "vocab_size": 32128
29
+ }
convert-from-malaya.ipynb ADDED
@@ -0,0 +1,734 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {
7
+ "scrolled": true
8
+ },
9
+ "outputs": [
10
+ {
11
+ "data": {
12
+ "text/plain": [
13
+ "'4.10.0'"
14
+ ]
15
+ },
16
+ "execution_count": 1,
17
+ "metadata": {},
18
+ "output_type": "execute_result"
19
+ }
20
+ ],
21
+ "source": [
22
+ "import transformers\n",
23
+ "transformers.__version__"
24
+ ]
25
+ },
26
+ {
27
+ "cell_type": "code",
28
+ "execution_count": 2,
29
+ "metadata": {},
30
+ "outputs": [],
31
+ "source": [
32
+ "from transformers import T5Config, T5Model, load_tf_weights_in_t5"
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "code",
37
+ "execution_count": 7,
38
+ "metadata": {},
39
+ "outputs": [
40
+ {
41
+ "name": "stdout",
42
+ "output_type": "stream",
43
+ "text": [
44
+ "checkpoint model.ckpt-1000000.index\r\n",
45
+ "model.ckpt-1000000.data-00000-of-00002 model.ckpt-1000000.meta\r\n",
46
+ "model.ckpt-1000000.data-00001-of-00002 operative_config.gin\r\n"
47
+ ]
48
+ }
49
+ ],
50
+ "source": [
51
+ "# !wget https://f000.backblazeb2.com/file/malaya-model/pretrained/t5-small-2021-07-28.tar.gz\n",
52
+ "# !tar -zxf t5-small-2021-07-28.tar.gz\n",
53
+ "# !rm t5-small-2021-07-28.tar.gz\n",
54
+ "!ls t5-small-v2"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "code",
59
+ "execution_count": 6,
60
+ "metadata": {},
61
+ "outputs": [
62
+ {
63
+ "name": "stdout",
64
+ "output_type": "stream",
65
+ "text": [
66
+ "T5Config {\n",
67
+ " \"d_ff\": 2048,\n",
68
+ " \"d_kv\": 64,\n",
69
+ " \"d_model\": 512,\n",
70
+ " \"decoder_start_token_id\": 0,\n",
71
+ " \"dropout_rate\": 0.1,\n",
72
+ " \"eos_token_id\": 1,\n",
73
+ " \"feed_forward_proj\": \"relu\",\n",
74
+ " \"gradient_checkpointing\": false,\n",
75
+ " \"initializer_factor\": 1.0,\n",
76
+ " \"inputs_length\": 1024,\n",
77
+ " \"is_encoder_decoder\": true,\n",
78
+ " \"layer_norm_epsilon\": 1e-06,\n",
79
+ " \"model_type\": \"t5\",\n",
80
+ " \"n_positions\": 1024,\n",
81
+ " \"num_decoder_layers\": 6,\n",
82
+ " \"num_heads\": 8,\n",
83
+ " \"num_layers\": 6,\n",
84
+ " \"pad_token_id\": 0,\n",
85
+ " \"relative_attention_num_buckets\": 32,\n",
86
+ " \"transformers_version\": \"4.10.0\",\n",
87
+ " \"use_cache\": true,\n",
88
+ " \"vocab_size\": 32128\n",
89
+ "}\n",
90
+ "\n"
91
+ ]
92
+ }
93
+ ],
94
+ "source": [
95
+ "config = T5Config(\n",
96
+ " vocab_size = 32128,\n",
97
+ " n_positions=1024,\n",
98
+ " d_ff = 2048,\n",
99
+ " d_kv = 64,\n",
100
+ " d_model = 512,\n",
101
+ " dropout_rate = 0.1,\n",
102
+ " inputs_length = 1024,\n",
103
+ " num_heads = 8,\n",
104
+ " num_layers = 6,\n",
105
+ " decoder_start_token_id = 0,\n",
106
+ " eos_token_id = 1,\n",
107
+ " pad_token_id = 0)\n",
108
+ "print(config)\n",
109
+ "config.save_pretrained('./')"
110
+ ]
111
+ },
112
+ {
113
+ "cell_type": "code",
114
+ "execution_count": 8,
115
+ "metadata": {},
116
+ "outputs": [
117
+ {
118
+ "data": {
119
+ "text/plain": [
120
+ "T5Model(\n",
121
+ " (shared): Embedding(32128, 512)\n",
122
+ " (encoder): T5Stack(\n",
123
+ " (embed_tokens): Embedding(32128, 512)\n",
124
+ " (block): ModuleList(\n",
125
+ " (0): T5Block(\n",
126
+ " (layer): ModuleList(\n",
127
+ " (0): T5LayerSelfAttention(\n",
128
+ " (SelfAttention): T5Attention(\n",
129
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
130
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
131
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
132
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
133
+ " (relative_attention_bias): Embedding(32, 8)\n",
134
+ " )\n",
135
+ " (layer_norm): T5LayerNorm()\n",
136
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
137
+ " )\n",
138
+ " (1): T5LayerFF(\n",
139
+ " (DenseReluDense): T5DenseReluDense(\n",
140
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
141
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
142
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
143
+ " )\n",
144
+ " (layer_norm): T5LayerNorm()\n",
145
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
146
+ " )\n",
147
+ " )\n",
148
+ " )\n",
149
+ " (1): T5Block(\n",
150
+ " (layer): ModuleList(\n",
151
+ " (0): T5LayerSelfAttention(\n",
152
+ " (SelfAttention): T5Attention(\n",
153
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
154
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
155
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
156
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
157
+ " )\n",
158
+ " (layer_norm): T5LayerNorm()\n",
159
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
160
+ " )\n",
161
+ " (1): T5LayerFF(\n",
162
+ " (DenseReluDense): T5DenseReluDense(\n",
163
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
164
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
165
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
166
+ " )\n",
167
+ " (layer_norm): T5LayerNorm()\n",
168
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
169
+ " )\n",
170
+ " )\n",
171
+ " )\n",
172
+ " (2): T5Block(\n",
173
+ " (layer): ModuleList(\n",
174
+ " (0): T5LayerSelfAttention(\n",
175
+ " (SelfAttention): T5Attention(\n",
176
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
177
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
178
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
179
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
180
+ " )\n",
181
+ " (layer_norm): T5LayerNorm()\n",
182
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
183
+ " )\n",
184
+ " (1): T5LayerFF(\n",
185
+ " (DenseReluDense): T5DenseReluDense(\n",
186
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
187
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
188
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
189
+ " )\n",
190
+ " (layer_norm): T5LayerNorm()\n",
191
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
192
+ " )\n",
193
+ " )\n",
194
+ " )\n",
195
+ " (3): T5Block(\n",
196
+ " (layer): ModuleList(\n",
197
+ " (0): T5LayerSelfAttention(\n",
198
+ " (SelfAttention): T5Attention(\n",
199
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
200
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
201
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
202
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
203
+ " )\n",
204
+ " (layer_norm): T5LayerNorm()\n",
205
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
206
+ " )\n",
207
+ " (1): T5LayerFF(\n",
208
+ " (DenseReluDense): T5DenseReluDense(\n",
209
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
210
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
211
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
212
+ " )\n",
213
+ " (layer_norm): T5LayerNorm()\n",
214
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
215
+ " )\n",
216
+ " )\n",
217
+ " )\n",
218
+ " (4): T5Block(\n",
219
+ " (layer): ModuleList(\n",
220
+ " (0): T5LayerSelfAttention(\n",
221
+ " (SelfAttention): T5Attention(\n",
222
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
223
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
224
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
225
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
226
+ " )\n",
227
+ " (layer_norm): T5LayerNorm()\n",
228
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
229
+ " )\n",
230
+ " (1): T5LayerFF(\n",
231
+ " (DenseReluDense): T5DenseReluDense(\n",
232
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
233
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
234
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
235
+ " )\n",
236
+ " (layer_norm): T5LayerNorm()\n",
237
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
238
+ " )\n",
239
+ " )\n",
240
+ " )\n",
241
+ " (5): T5Block(\n",
242
+ " (layer): ModuleList(\n",
243
+ " (0): T5LayerSelfAttention(\n",
244
+ " (SelfAttention): T5Attention(\n",
245
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
246
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
247
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
248
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
249
+ " )\n",
250
+ " (layer_norm): T5LayerNorm()\n",
251
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
252
+ " )\n",
253
+ " (1): T5LayerFF(\n",
254
+ " (DenseReluDense): T5DenseReluDense(\n",
255
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
256
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
257
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
258
+ " )\n",
259
+ " (layer_norm): T5LayerNorm()\n",
260
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
261
+ " )\n",
262
+ " )\n",
263
+ " )\n",
264
+ " )\n",
265
+ " (final_layer_norm): T5LayerNorm()\n",
266
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
267
+ " )\n",
268
+ " (decoder): T5Stack(\n",
269
+ " (embed_tokens): Embedding(32128, 512)\n",
270
+ " (block): ModuleList(\n",
271
+ " (0): T5Block(\n",
272
+ " (layer): ModuleList(\n",
273
+ " (0): T5LayerSelfAttention(\n",
274
+ " (SelfAttention): T5Attention(\n",
275
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
276
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
277
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
278
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
279
+ " (relative_attention_bias): Embedding(32, 8)\n",
280
+ " )\n",
281
+ " (layer_norm): T5LayerNorm()\n",
282
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
283
+ " )\n",
284
+ " (1): T5LayerCrossAttention(\n",
285
+ " (EncDecAttention): T5Attention(\n",
286
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
287
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
288
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
289
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
290
+ " )\n",
291
+ " (layer_norm): T5LayerNorm()\n",
292
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
293
+ " )\n",
294
+ " (2): T5LayerFF(\n",
295
+ " (DenseReluDense): T5DenseReluDense(\n",
296
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
297
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
298
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
299
+ " )\n",
300
+ " (layer_norm): T5LayerNorm()\n",
301
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
302
+ " )\n",
303
+ " )\n",
304
+ " )\n",
305
+ " (1): T5Block(\n",
306
+ " (layer): ModuleList(\n",
307
+ " (0): T5LayerSelfAttention(\n",
308
+ " (SelfAttention): T5Attention(\n",
309
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
310
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
311
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
312
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
313
+ " )\n",
314
+ " (layer_norm): T5LayerNorm()\n",
315
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
316
+ " )\n",
317
+ " (1): T5LayerCrossAttention(\n",
318
+ " (EncDecAttention): T5Attention(\n",
319
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
320
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
321
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
322
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
323
+ " )\n",
324
+ " (layer_norm): T5LayerNorm()\n",
325
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
326
+ " )\n",
327
+ " (2): T5LayerFF(\n",
328
+ " (DenseReluDense): T5DenseReluDense(\n",
329
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
330
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
331
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
332
+ " )\n",
333
+ " (layer_norm): T5LayerNorm()\n",
334
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
335
+ " )\n",
336
+ " )\n",
337
+ " )\n",
338
+ " (2): T5Block(\n",
339
+ " (layer): ModuleList(\n",
340
+ " (0): T5LayerSelfAttention(\n",
341
+ " (SelfAttention): T5Attention(\n",
342
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
343
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
344
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
345
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
346
+ " )\n",
347
+ " (layer_norm): T5LayerNorm()\n",
348
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
349
+ " )\n",
350
+ " (1): T5LayerCrossAttention(\n",
351
+ " (EncDecAttention): T5Attention(\n",
352
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
353
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
354
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
355
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
356
+ " )\n",
357
+ " (layer_norm): T5LayerNorm()\n",
358
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
359
+ " )\n",
360
+ " (2): T5LayerFF(\n",
361
+ " (DenseReluDense): T5DenseReluDense(\n",
362
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
363
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
364
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
365
+ " )\n",
366
+ " (layer_norm): T5LayerNorm()\n",
367
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
368
+ " )\n",
369
+ " )\n",
370
+ " )\n",
371
+ " (3): T5Block(\n",
372
+ " (layer): ModuleList(\n",
373
+ " (0): T5LayerSelfAttention(\n",
374
+ " (SelfAttention): T5Attention(\n",
375
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
376
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
377
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
378
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
379
+ " )\n",
380
+ " (layer_norm): T5LayerNorm()\n",
381
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
382
+ " )\n",
383
+ " (1): T5LayerCrossAttention(\n",
384
+ " (EncDecAttention): T5Attention(\n",
385
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
386
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
387
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
388
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
389
+ " )\n",
390
+ " (layer_norm): T5LayerNorm()\n",
391
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
392
+ " )\n",
393
+ " (2): T5LayerFF(\n",
394
+ " (DenseReluDense): T5DenseReluDense(\n",
395
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
396
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
397
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
398
+ " )\n",
399
+ " (layer_norm): T5LayerNorm()\n",
400
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
401
+ " )\n",
402
+ " )\n",
403
+ " )\n",
404
+ " (4): T5Block(\n",
405
+ " (layer): ModuleList(\n",
406
+ " (0): T5LayerSelfAttention(\n",
407
+ " (SelfAttention): T5Attention(\n",
408
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
409
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
410
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
411
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
412
+ " )\n",
413
+ " (layer_norm): T5LayerNorm()\n",
414
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
415
+ " )\n",
416
+ " (1): T5LayerCrossAttention(\n",
417
+ " (EncDecAttention): T5Attention(\n",
418
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
419
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
420
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
421
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
422
+ " )\n",
423
+ " (layer_norm): T5LayerNorm()\n",
424
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
425
+ " )\n",
426
+ " (2): T5LayerFF(\n",
427
+ " (DenseReluDense): T5DenseReluDense(\n",
428
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
429
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
430
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
431
+ " )\n",
432
+ " (layer_norm): T5LayerNorm()\n",
433
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
434
+ " )\n",
435
+ " )\n",
436
+ " )\n",
437
+ " (5): T5Block(\n",
438
+ " (layer): ModuleList(\n",
439
+ " (0): T5LayerSelfAttention(\n",
440
+ " (SelfAttention): T5Attention(\n",
441
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
442
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
443
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
444
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
445
+ " )\n",
446
+ " (layer_norm): T5LayerNorm()\n",
447
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
448
+ " )\n",
449
+ " (1): T5LayerCrossAttention(\n",
450
+ " (EncDecAttention): T5Attention(\n",
451
+ " (q): Linear(in_features=512, out_features=512, bias=False)\n",
452
+ " (k): Linear(in_features=512, out_features=512, bias=False)\n",
453
+ " (v): Linear(in_features=512, out_features=512, bias=False)\n",
454
+ " (o): Linear(in_features=512, out_features=512, bias=False)\n",
455
+ " )\n",
456
+ " (layer_norm): T5LayerNorm()\n",
457
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
458
+ " )\n",
459
+ " (2): T5LayerFF(\n",
460
+ " (DenseReluDense): T5DenseReluDense(\n",
461
+ " (wi): Linear(in_features=512, out_features=2048, bias=False)\n",
462
+ " (wo): Linear(in_features=2048, out_features=512, bias=False)\n",
463
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
464
+ " )\n",
465
+ " (layer_norm): T5LayerNorm()\n",
466
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
467
+ " )\n",
468
+ " )\n",
469
+ " )\n",
470
+ " )\n",
471
+ " (final_layer_norm): T5LayerNorm()\n",
472
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
473
+ " )\n",
474
+ ")"
475
+ ]
476
+ },
477
+ "execution_count": 8,
478
+ "metadata": {},
479
+ "output_type": "execute_result"
480
+ }
481
+ ],
482
+ "source": [
483
+ "model = T5Model(config)\n",
484
+ "load_tf_weights_in_t5(model, config, 't5-small-v2/model.ckpt-1000000')"
485
+ ]
486
+ },
487
+ {
488
+ "cell_type": "code",
489
+ "execution_count": 9,
490
+ "metadata": {},
491
+ "outputs": [
492
+ {
493
+ "data": {
494
+ "text/plain": [
495
+ "('config.json', 'pytorch_model.bin')"
496
+ ]
497
+ },
498
+ "execution_count": 9,
499
+ "metadata": {},
500
+ "output_type": "execute_result"
501
+ }
502
+ ],
503
+ "source": [
504
+ "from transformers import CONFIG_NAME, WEIGHTS_NAME\n",
505
+ "CONFIG_NAME, WEIGHTS_NAME"
506
+ ]
507
+ },
508
+ {
509
+ "cell_type": "code",
510
+ "execution_count": 10,
511
+ "metadata": {},
512
+ "outputs": [],
513
+ "source": [
514
+ "import torch\n",
515
+ "\n",
516
+ "torch.save(model.state_dict(), './' + WEIGHTS_NAME)"
517
+ ]
518
+ },
519
+ {
520
+ "cell_type": "code",
521
+ "execution_count": 11,
522
+ "metadata": {},
523
+ "outputs": [],
524
+ "source": [
525
+ "from transformers import T5Config, T5Model, T5Tokenizer"
526
+ ]
527
+ },
528
+ {
529
+ "cell_type": "code",
530
+ "execution_count": 12,
531
+ "metadata": {},
532
+ "outputs": [],
533
+ "source": [
534
+ "# !wget https://f000.backblazeb2.com/file/malaya-model/bpe/sp10m.cased.ms-en.model"
535
+ ]
536
+ },
537
+ {
538
+ "cell_type": "code",
539
+ "execution_count": 13,
540
+ "metadata": {},
541
+ "outputs": [
542
+ {
543
+ "data": {
544
+ "text/plain": [
545
+ "('./tokenizer_config.json',\n",
546
+ " './special_tokens_map.json',\n",
547
+ " './spiece.model',\n",
548
+ " './added_tokens.json')"
549
+ ]
550
+ },
551
+ "execution_count": 13,
552
+ "metadata": {},
553
+ "output_type": "execute_result"
554
+ }
555
+ ],
556
+ "source": [
557
+ "tokenizer = T5Tokenizer('sp10m.cased.ms-en.model')\n",
558
+ "tokenizer.save_pretrained('./')"
559
+ ]
560
+ },
561
+ {
562
+ "cell_type": "code",
563
+ "execution_count": 14,
564
+ "metadata": {},
565
+ "outputs": [],
566
+ "source": [
567
+ "tokenizer = T5Tokenizer.from_pretrained('./', lower = False)"
568
+ ]
569
+ },
570
+ {
571
+ "cell_type": "code",
572
+ "execution_count": 15,
573
+ "metadata": {},
574
+ "outputs": [],
575
+ "source": [
576
+ "config = T5Config.from_pretrained('./')"
577
+ ]
578
+ },
579
+ {
580
+ "cell_type": "code",
581
+ "execution_count": 16,
582
+ "metadata": {},
583
+ "outputs": [],
584
+ "source": [
585
+ "model = T5Model.from_pretrained('./pytorch_model.bin', config = config)"
586
+ ]
587
+ },
588
+ {
589
+ "cell_type": "code",
590
+ "execution_count": 17,
591
+ "metadata": {},
592
+ "outputs": [],
593
+ "source": [
594
+ "model.save_pretrained('./')"
595
+ ]
596
+ },
597
+ {
598
+ "cell_type": "code",
599
+ "execution_count": 18,
600
+ "metadata": {},
601
+ "outputs": [],
602
+ "source": [
603
+ "from transformers import T5Tokenizer, T5ForConditionalGeneration"
604
+ ]
605
+ },
606
+ {
607
+ "cell_type": "code",
608
+ "execution_count": 19,
609
+ "metadata": {},
610
+ "outputs": [],
611
+ "source": [
612
+ "model = T5ForConditionalGeneration.from_pretrained('./')"
613
+ ]
614
+ },
615
+ {
616
+ "cell_type": "code",
617
+ "execution_count": 20,
618
+ "metadata": {},
619
+ "outputs": [
620
+ {
621
+ "data": {
622
+ "text/plain": [
623
+ "'<pad> Mahathir Mohamad</s>'"
624
+ ]
625
+ },
626
+ "execution_count": 20,
627
+ "metadata": {},
628
+ "output_type": "execute_result"
629
+ }
630
+ ],
631
+ "source": [
632
+ "input_ids = tokenizer.encode('soalan: siapakah perdana menteri malaysia?', return_tensors = 'pt')\n",
633
+ "outputs = model.generate(input_ids)\n",
634
+ "tokenizer.decode(outputs[0])"
635
+ ]
636
+ },
637
+ {
638
+ "cell_type": "code",
639
+ "execution_count": 21,
640
+ "metadata": {},
641
+ "outputs": [
642
+ {
643
+ "data": {
644
+ "text/plain": [
645
+ "'<pad> PETALING JAYA: Bekas perdana menteri Najib Razak mempersoalkan sama ada kerajaan tahu bagaimana menguruskan wabak Covid'"
646
+ ]
647
+ },
648
+ "execution_count": 21,
649
+ "metadata": {},
650
+ "output_type": "execute_result"
651
+ }
652
+ ],
653
+ "source": [
654
+ "input_ids = tokenizer.encode('terjemah Inggeris ke Melayu: PETALING JAYA: Former prime minister Najib Razak has questioned whether the government knows how to manage the Covid-19 pandemic, outlining several seemingly contradictory announcements it has made.', return_tensors = 'pt')\n",
655
+ "outputs = model.generate(input_ids)\n",
656
+ "tokenizer.decode(outputs[0])"
657
+ ]
658
+ },
659
+ {
660
+ "cell_type": "code",
661
+ "execution_count": 22,
662
+ "metadata": {},
663
+ "outputs": [
664
+ {
665
+ "data": {
666
+ "text/plain": [
667
+ "'<pad> PETALING JAYA: The meeting of former Prime Minister Datuk Seri Najib Tun Razak and Deputy Prime Minister'"
668
+ ]
669
+ },
670
+ "execution_count": 22,
671
+ "metadata": {},
672
+ "output_type": "execute_result"
673
+ }
674
+ ],
675
+ "source": [
676
+ "input_ids = tokenizer.encode('terjemah Melayu ke Inggeris: PETALING JAYA: Pertemuan bekas Perdana Menteri, Datuk Seri Najib Tun Razak dan Timbalan Perdana Menteri, Datuk Seri Ismail Sabri Yaakob hari ini adalah bagi membincangkan isu berkaitan hala tuju dan dasar negara.', return_tensors = 'pt')\n",
677
+ "outputs = model.generate(input_ids)\n",
678
+ "tokenizer.decode(outputs[0])"
679
+ ]
680
+ },
681
+ {
682
+ "cell_type": "code",
683
+ "execution_count": 23,
684
+ "metadata": {},
685
+ "outputs": [
686
+ {
687
+ "data": {
688
+ "text/plain": [
689
+ "'<pad> Roman Catholic Archdiocese of Maracaibo shares border with Roman Catholic Diocese'"
690
+ ]
691
+ },
692
+ "execution_count": 23,
693
+ "metadata": {},
694
+ "output_type": "execute_result"
695
+ }
696
+ ],
697
+ "source": [
698
+ "input_ids = tokenizer.encode('grafik pengetahuan: Keuskupan Agung Katolik Rom Maracaibo terletak di barat daya Keuskupan Katolik Rom Machiques.', return_tensors = 'pt')\n",
699
+ "outputs = model.generate(input_ids)\n",
700
+ "tokenizer.decode(outputs[0])"
701
+ ]
702
+ },
703
+ {
704
+ "cell_type": "code",
705
+ "execution_count": 24,
706
+ "metadata": {},
707
+ "outputs": [],
708
+ "source": [
709
+ "!rm -rf t5-small-v2"
710
+ ]
711
+ }
712
+ ],
713
+ "metadata": {
714
+ "kernelspec": {
715
+ "display_name": "Python 3",
716
+ "language": "python",
717
+ "name": "python3"
718
+ },
719
+ "language_info": {
720
+ "codemirror_mode": {
721
+ "name": "ipython",
722
+ "version": 3
723
+ },
724
+ "file_extension": ".py",
725
+ "mimetype": "text/x-python",
726
+ "name": "python",
727
+ "nbconvert_exporter": "python",
728
+ "pygments_lexer": "ipython3",
729
+ "version": "3.7.7"
730
+ }
731
+ },
732
+ "nbformat": 4,
733
+ "nbformat_minor": 4
734
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b447c69d9920811200c7322e2c594eef1099e7df6f7cc8fdf311e2ea8ef670e
3
+ size 242087629
sp10m.cased.ms-en.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26de51154cccc9db6e65e5d466bdb0b1fff9fab1d80f4689711de943448addd6
3
+ size 803030
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "additional_special_tokens": ["<extra_id_0>", "<extra_id_1>", "<extra_id_2>", "<extra_id_3>", "<extra_id_4>", "<extra_id_5>", "<extra_id_6>", "<extra_id_7>", "<extra_id_8>", "<extra_id_9>", "<extra_id_10>", "<extra_id_11>", "<extra_id_12>", "<extra_id_13>", "<extra_id_14>", "<extra_id_15>", "<extra_id_16>", "<extra_id_17>", "<extra_id_18>", "<extra_id_19>", "<extra_id_20>", "<extra_id_21>", "<extra_id_22>", "<extra_id_23>", "<extra_id_24>", "<extra_id_25>", "<extra_id_26>", "<extra_id_27>", "<extra_id_28>", "<extra_id_29>", "<extra_id_30>", "<extra_id_31>", "<extra_id_32>", "<extra_id_33>", "<extra_id_34>", "<extra_id_35>", "<extra_id_36>", "<extra_id_37>", "<extra_id_38>", "<extra_id_39>", "<extra_id_40>", "<extra_id_41>", "<extra_id_42>", "<extra_id_43>", "<extra_id_44>", "<extra_id_45>", "<extra_id_46>", "<extra_id_47>", "<extra_id_48>", "<extra_id_49>", "<extra_id_50>", "<extra_id_51>", "<extra_id_52>", "<extra_id_53>", "<extra_id_54>", "<extra_id_55>", "<extra_id_56>", "<extra_id_57>", "<extra_id_58>", "<extra_id_59>", "<extra_id_60>", "<extra_id_61>", "<extra_id_62>", "<extra_id_63>", "<extra_id_64>", "<extra_id_65>", "<extra_id_66>", "<extra_id_67>", "<extra_id_68>", "<extra_id_69>", "<extra_id_70>", "<extra_id_71>", "<extra_id_72>", "<extra_id_73>", "<extra_id_74>", "<extra_id_75>", "<extra_id_76>", "<extra_id_77>", "<extra_id_78>", "<extra_id_79>", "<extra_id_80>", "<extra_id_81>", "<extra_id_82>", "<extra_id_83>", "<extra_id_84>", "<extra_id_85>", "<extra_id_86>", "<extra_id_87>", "<extra_id_88>", "<extra_id_89>", "<extra_id_90>", "<extra_id_91>", "<extra_id_92>", "<extra_id_93>", "<extra_id_94>", "<extra_id_95>", "<extra_id_96>", "<extra_id_97>", "<extra_id_98>", "<extra_id_99>"]}
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26de51154cccc9db6e65e5d466bdb0b1fff9fab1d80f4689711de943448addd6
3
+ size 803030
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "extra_ids": 100, "additional_special_tokens": ["<extra_id_0>", "<extra_id_1>", "<extra_id_2>", "<extra_id_3>", "<extra_id_4>", "<extra_id_5>", "<extra_id_6>", "<extra_id_7>", "<extra_id_8>", "<extra_id_9>", "<extra_id_10>", "<extra_id_11>", "<extra_id_12>", "<extra_id_13>", "<extra_id_14>", "<extra_id_15>", "<extra_id_16>", "<extra_id_17>", "<extra_id_18>", "<extra_id_19>", "<extra_id_20>", "<extra_id_21>", "<extra_id_22>", "<extra_id_23>", "<extra_id_24>", "<extra_id_25>", "<extra_id_26>", "<extra_id_27>", "<extra_id_28>", "<extra_id_29>", "<extra_id_30>", "<extra_id_31>", "<extra_id_32>", "<extra_id_33>", "<extra_id_34>", "<extra_id_35>", "<extra_id_36>", "<extra_id_37>", "<extra_id_38>", "<extra_id_39>", "<extra_id_40>", "<extra_id_41>", "<extra_id_42>", "<extra_id_43>", "<extra_id_44>", "<extra_id_45>", "<extra_id_46>", "<extra_id_47>", "<extra_id_48>", "<extra_id_49>", "<extra_id_50>", "<extra_id_51>", "<extra_id_52>", "<extra_id_53>", "<extra_id_54>", "<extra_id_55>", "<extra_id_56>", "<extra_id_57>", "<extra_id_58>", "<extra_id_59>", "<extra_id_60>", "<extra_id_61>", "<extra_id_62>", "<extra_id_63>", "<extra_id_64>", "<extra_id_65>", "<extra_id_66>", "<extra_id_67>", "<extra_id_68>", "<extra_id_69>", "<extra_id_70>", "<extra_id_71>", "<extra_id_72>", "<extra_id_73>", "<extra_id_74>", "<extra_id_75>", "<extra_id_76>", "<extra_id_77>", "<extra_id_78>", "<extra_id_79>", "<extra_id_80>", "<extra_id_81>", "<extra_id_82>", "<extra_id_83>", "<extra_id_84>", "<extra_id_85>", "<extra_id_86>", "<extra_id_87>", "<extra_id_88>", "<extra_id_89>", "<extra_id_90>", "<extra_id_91>", "<extra_id_92>", "<extra_id_93>", "<extra_id_94>", "<extra_id_95>", "<extra_id_96>", "<extra_id_97>", "<extra_id_98>", "<extra_id_99>"], "sp_model_kwargs": {}, "tokenizer_class": "T5Tokenizer"}