Update README.md
Browse files
README.md
CHANGED
@@ -50,9 +50,7 @@ Our work aims to broaden NLP coverage by allowing practitioners to identify rele
|
|
50 |
|
51 |
## Training data
|
52 |
|
53 |
-
The model was trained on the OpenLID dataset which is available [through the github repo](https://github.com/laurieburchell/open-lid-dataset)
|
54 |
-
|
55 |
-
The final dataset contains 121 million lines of data in 201 language classes. Before sampling, the mean number of lines per language is 602,812. The smallest class contains 532 lines of data (South Azerbaijani) and the largest contains 7.5 million lines of data (English). More details at paper
|
56 |
|
57 |
## Training procedure
|
58 |
|
|
|
50 |
|
51 |
## Training data
|
52 |
|
53 |
+
The model was trained on the OpenLID dataset which is available [through the github repo](https://github.com/laurieburchell/open-lid-dataset).
|
|
|
|
|
54 |
|
55 |
## Training procedure
|
56 |
|