martynawck commited on
Commit
01a7a32
β€’
1 Parent(s): d1096b6

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +1 -1
index.html CHANGED
@@ -13,7 +13,7 @@
13
  <hr>
14
  <h2 style="text-align: center;">NLPre-PL Dataset</h2>
15
  <hr>
16
- <p>The official NLPre-PL dataset - a uniformly paragraph-level divided version of NKJP1M corpus – the 1-million token balanced subcorpus of the National Corpus of Polish (Narodowy Korpus Jezyka Polskiego).
17
  </p>
18
  <p>
19
  The NLPre dataset aims at fairly dividing the paragraphs length-wise and topic-wise into train, development, and test sets. Thus, we ensure a similar number of segments distribution per paragraph and avoid the situation when paragraphs with a small (or large) number of segments are available only e.g. during test time.
 
13
  <hr>
14
  <h2 style="text-align: center;">NLPre-PL Dataset</h2>
15
  <hr>
16
+ <p>The official NLPre-PL dataset - a uniformly paragraph-level divided version of NKJP1M corpus – the 1 million token balanced subcorpus of the National Corpus of Polish (Narodowy Korpus Jezyka Polskiego).
17
  </p>
18
  <p>
19
  The NLPre dataset aims at fairly dividing the paragraphs length-wise and topic-wise into train, development, and test sets. Thus, we ensure a similar number of segments distribution per paragraph and avoid the situation when paragraphs with a small (or large) number of segments are available only e.g. during test time.