martynawck
commited on
Commit
•
38be1cb
1
Parent(s):
01a7a32
Update index.html
Browse files- index.html +1 -1
index.html
CHANGED
@@ -13,7 +13,7 @@
|
|
13 |
<hr>
|
14 |
<h2 style="text-align: center;">NLPre-PL Dataset</h2>
|
15 |
<hr>
|
16 |
-
<p>The official NLPre-PL dataset - a uniformly paragraph-level divided version of NKJP1M corpus
|
17 |
</p>
|
18 |
<p>
|
19 |
The NLPre dataset aims at fairly dividing the paragraphs length-wise and topic-wise into train, development, and test sets. Thus, we ensure a similar number of segments distribution per paragraph and avoid the situation when paragraphs with a small (or large) number of segments are available only e.g. during test time.
|
|
|
13 |
<hr>
|
14 |
<h2 style="text-align: center;">NLPre-PL Dataset</h2>
|
15 |
<hr>
|
16 |
+
<p>The official NLPre-PL dataset - a uniformly paragraph-level divided version of NKJP1M corpus - the 1 million token balanced subcorpus of the National Corpus of Polish (Narodowy Korpus Jezyka Polskiego).
|
17 |
</p>
|
18 |
<p>
|
19 |
The NLPre dataset aims at fairly dividing the paragraphs length-wise and topic-wise into train, development, and test sets. Thus, we ensure a similar number of segments distribution per paragraph and avoid the situation when paragraphs with a small (or large) number of segments are available only e.g. during test time.
|