Finnish-NLP
/

gpt2-finnish

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

aapot commited on Feb 13, 2022

Commit

f78faa9

•

1 Parent(s): 84a9528

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ datasets:
 - Finnish-NLP/mc4_fi_cleaned
 - wikipedia
 widget:
-- text: "Olipa kerran tekoäly"
 ---
@@ -87,7 +87,7 @@ As with all language models, it is hard to predict in advance how the Finnish GP
 ## Training data
-This Finnish GPT-2 model was pretrained on the combination of five datasets:
 - [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. We used the Finnish subset of the mC4 dataset and further cleaned it with our own text data cleaning codes (check the dataset repo).
 - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
 - [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501)

 - Finnish-NLP/mc4_fi_cleaned
 - wikipedia
 widget:
+- text: "Tekstiä tuottava tekoäly on"
 ---
 ## Training data
+This Finnish GPT-2 model was pretrained on the combination of six datasets:
 - [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned), the dataset mC4 is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus. We used the Finnish subset of the mC4 dataset and further cleaned it with our own text data cleaning codes (check the dataset repo).
 - [wikipedia](https://huggingface.co/datasets/wikipedia) We used the Finnish subset of the wikipedia (August 2021) dataset
 - [Yle Finnish News Archive 2011-2018](http://urn.fi/urn:nbn:fi:lb-2017070501)