|
Stopwords Corpus |
|
|
|
This corpus contains lists of stop words for several languages. These |
|
are high-frequency grammatical words which are usually ignored in text |
|
retrieval applications. |
|
|
|
They were obtained from: |
|
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/ |
|
|
|
The stop words for the Romanian language were obtained from: |
|
http://arlc.ro/resources/ |
|
|
|
The English list has been augmented |
|
https://github.com/nltk/nltk_data/issues/22 |
|
|
|
The German list has been corrected |
|
https://github.com/nltk/nltk_data/pull/49 |
|
|
|
A Kazakh list has been added |
|
https://github.com/nltk/nltk_data/pull/52 |
|
|
|
A Nepali list has been added |
|
https://github.com/nltk/nltk_data/pull/83 |
|
|
|
An Azerbaijani list has been added |
|
https://github.com/nltk/nltk_data/pull/100 |
|
|
|
A Greek list has been added |
|
https://github.com/nltk/nltk_data/pull/103 |
|
|
|
An Indonesian list has been added |
|
https://github.com/nltk/nltk_data/pull/112 |
|
|