laurievb
/

OpenLID

laurievb commited on Nov 6, 2023

Commit

e5953db

1 Parent(s): 2076a7d

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -50,9 +50,7 @@ Our work aims to broaden NLP coverage by allowing practitioners to identify rele
 ## Training data
-The model was trained on the OpenLID dataset which is available [through the github repo](https://github.com/laurieburchell/open-lid-dataset) or on HuggingFace.
-The final dataset contains 121 million lines of data in 201 language classes. Before sampling, the mean number of lines per language is 602,812. The smallest class contains 532 lines of data (South Azerbaijani) and the largest contains 7.5 million lines of data (English). More details at paper
 ## Training procedure

 ## Training data
+The model was trained on the OpenLID dataset which is available [through the github repo](https://github.com/laurieburchell/open-lid-dataset).
 ## Training procedure