seyoungsong commited on
Commit
7ff7a54
1 Parent(s): 1a680b7

Update README

Browse files
Files changed (2) hide show
  1. .gitignore +2 -0
  2. README.md +263 -2
.gitignore CHANGED
@@ -1,3 +1,5 @@
 
 
1
  # Byte-compiled / optimized / DLL files
2
  __pycache__/
3
  *.py[cod]
 
1
+ temp.*
2
+
3
  # Byte-compiled / optimized / DLL files
4
  __pycache__/
5
  *.py[cod]
README.md CHANGED
@@ -1,4 +1,265 @@
1
  ---
2
- license: mit
3
  pipeline_tag: translation
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  pipeline_tag: translation
3
+ license: mit
4
+ language:
5
+ - multilingual
6
+ - af
7
+ - am
8
+ - ar
9
+ - ast
10
+ - az
11
+ - ba
12
+ - be
13
+ - bg
14
+ - bn
15
+ - br
16
+ - bs
17
+ - ca
18
+ - ceb
19
+ - cs
20
+ - cy
21
+ - da
22
+ - de
23
+ - el
24
+ - en
25
+ - es
26
+ - et
27
+ - fa
28
+ - ff
29
+ - fi
30
+ - fr
31
+ - fy
32
+ - ga
33
+ - gd
34
+ - gl
35
+ - gu
36
+ - ha
37
+ - he
38
+ - hi
39
+ - hr
40
+ - ht
41
+ - hu
42
+ - hy
43
+ - id
44
+ - ig
45
+ - ilo
46
+ - is
47
+ - it
48
+ - ja
49
+ - jv
50
+ - ka
51
+ - kk
52
+ - km
53
+ - kn
54
+ - ko
55
+ - lb
56
+ - lg
57
+ - ln
58
+ - lo
59
+ - lt
60
+ - lv
61
+ - mg
62
+ - mk
63
+ - ml
64
+ - mn
65
+ - mr
66
+ - ms
67
+ - my
68
+ - ne
69
+ - nl
70
+ - no
71
+ - ns
72
+ - oc
73
+ - or
74
+ - pa
75
+ - pl
76
+ - ps
77
+ - pt
78
+ - ro
79
+ - ru
80
+ - sd
81
+ - si
82
+ - sk
83
+ - sl
84
+ - so
85
+ - sq
86
+ - sr
87
+ - ss
88
+ - su
89
+ - sv
90
+ - sw
91
+ - ta
92
+ - th
93
+ - tl
94
+ - tn
95
+ - tr
96
+ - uk
97
+ - ur
98
+ - uz
99
+ - vi
100
+ - wo
101
+ - xh
102
+ - yi
103
+ - yo
104
+ - zh
105
+ - zu
106
+ ---
107
+
108
+ # `flores101_mm100_175M`
109
+
110
+ https://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html
111
+
112
+ `flores101_mm100_175M` is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. It was first released in [this](https://github.com/facebookresearch/fairseq/tree/main/examples/flores101) repository.
113
+
114
+ ```python
115
+ from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
116
+
117
+ hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
118
+ chinese_text = "生活就像一盒巧克力。"
119
+
120
+ model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
121
+ tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
122
+
123
+ # translate Hindi to French
124
+ tokenizer.src_lang = "hi"
125
+ encoded_hi = tokenizer(hi_text, return_tensors="pt")
126
+ generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("fr"))
127
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
128
+ # => "La vie est comme une boîte de chocolat."
129
+
130
+ # translate Chinese to English
131
+ tokenizer.src_lang = "zh"
132
+ encoded_zh = tokenizer(chinese_text, return_tensors="pt")
133
+ generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
134
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
135
+ # => "Life is like a box of chocolate."
136
+ ```
137
+
138
+ ## Languages covered
139
+
140
+ | Language | lang code |
141
+ | ---------------- | --------- |
142
+ | Akrikaans | af |
143
+ | Amharic | am |
144
+ | Arabic | ar |
145
+ | Assamese | as |
146
+ | Asturian | ast |
147
+ | Aymara | ay |
148
+ | Azerbaijani | az |
149
+ | Bashkir | ba |
150
+ | Belarusian | be |
151
+ | Bulgarian | bg |
152
+ | Bengali | bn |
153
+ | Breton | br |
154
+ | Bosnian | bs |
155
+ | Catalan | ca |
156
+ | Cebuano | ceb |
157
+ | Chokwe | cjk |
158
+ | Czech | cs |
159
+ | Welsh | cy |
160
+ | Danish | da |
161
+ | German | de |
162
+ | Dyula | dyu |
163
+ | Greek | el |
164
+ | English | en |
165
+ | Spanish | es |
166
+ | Estonian | et |
167
+ | Persian | fa |
168
+ | Fulah | ff |
169
+ | Finnish | fi |
170
+ | French | fr |
171
+ | Western Frisian | fy |
172
+ | Irish | ga |
173
+ | Scottish Gaelic | gd |
174
+ | Galician | gl |
175
+ | Gujarati | gu |
176
+ | Hausa | ha |
177
+ | Hebrew | he |
178
+ | Hindi | hi |
179
+ | Croatian | hr |
180
+ | Haitian Creole | ht |
181
+ | Hungarian | hu |
182
+ | Armenian | hy |
183
+ | Indonesian | id |
184
+ | Igbo | ig |
185
+ | Iloko | ilo |
186
+ | Icelandic | is |
187
+ | Italian | it |
188
+ | Japanese | ja |
189
+ | Javanese | jv |
190
+ | Georgian | ka |
191
+ | Kachin | kac |
192
+ | Kamba | kam |
193
+ | Kabuverdianu | kea |
194
+ | Kongo | kg |
195
+ | Kazakh | kk |
196
+ | Central Khmer | km |
197
+ | Kimbundu | kmb |
198
+ | Northern Kurdish | kmr |
199
+ | Kannada | kn |
200
+ | Korean | ko |
201
+ | Kurdish | ku |
202
+ | Kyrgyz | ky |
203
+ | Luxembourgish | lb |
204
+ | Ganda | lg |
205
+ | Lingala | ln |
206
+ | Lao | lo |
207
+ | Lithuanian | lt |
208
+ | Luo | luo |
209
+ | Latvian | lv |
210
+ | Malagasy | mg |
211
+ | Maori | mi |
212
+ | Macedonian | mk |
213
+ | Malayalam | ml |
214
+ | Mongolian | mn |
215
+ | Marathi | mr |
216
+ | Malay | ms |
217
+ | Maltese | mt |
218
+ | Burmese | my |
219
+ | Nepali | ne |
220
+ | Dutch | nl |
221
+ | Norwegian | no |
222
+ | Northern Sotho | ns |
223
+ | Nyanja | ny |
224
+ | Occitan | oc |
225
+ | Oromo | om |
226
+ | Oriya | or |
227
+ | Punjabi | pa |
228
+ | Polish | pl |
229
+ | Pashto | ps |
230
+ | Portuguese | pt |
231
+ | Quechua | qu |
232
+ | Romanian | ro |
233
+ | Russian | ru |
234
+ | Sindhi | sd |
235
+ | Shan | shn |
236
+ | Sinhala | si |
237
+ | Slovak | sk |
238
+ | Slovenian | sl |
239
+ | Shona | sn |
240
+ | Somali | so |
241
+ | Albanian | sq |
242
+ | Serbian | sr |
243
+ | Swati | ss |
244
+ | Sundanese | su |
245
+ | Swedish | sv |
246
+ | Swahili | sw |
247
+ | Tamil | ta |
248
+ | Telugu | te |
249
+ | Tajik | tg |
250
+ | Thai | th |
251
+ | Tigrinya | ti |
252
+ | Tagalog | tl |
253
+ | Tswana | tn |
254
+ | Turkish | tr |
255
+ | Ukrainian | uk |
256
+ | Umbundu | umb |
257
+ | Urdu | ur |
258
+ | Uzbek | uz |
259
+ | Vietnamese | vi |
260
+ | Wolof | wo |
261
+ | Xhosa | xh |
262
+ | Yiddish | yi |
263
+ | Yoruba | yo |
264
+ | Chinese | zh |
265
+ | Zulu | zu |