hannuta commited on
Commit
7324da2
1 Parent(s): f99bc3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -41,8 +41,8 @@ The pretrained model learns an inner representation of the english language that
41
  The model is best at what it was pretrained for however, which is generating texts from a prompt.
42
  A prompt is a piece of text inserted in the input examples, so that the original task can be formulated as a (masked) language modeling problem.
43
 
44
- To fit the model to the domain of german news for the downstream task of title and teaser generation it was finetuned on a dataset with 10.000 german news articles in a multi-task finetuning fashion.
45
- Hence the finetuned models name drives from the model it was finetuned from (gptj), the downstream generation tasks (title, teaser) and the size of the finetuning dataset (10k)
46
 
47
  - **Developed by:** snipaid
48
  - **Model type:** gptj
@@ -103,7 +103,7 @@ For further information see [limitations and biases of GPT-J](https://huggingfac
103
 
104
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
105
 
106
- The model was finetuned on a collection of 10.000 news items scraped from different online news outlets* in german language.
107
 
108
  \* *Namely: Speedweek, n-tv, Welt, Tagesspiegel, Faz, Merkur, Bild, Focus, Rp-Online, Freie Presse, Weser-Kurier, Tz, Stern, Kicker, Taz, Schwäbische Zeitung, Frankfurter Rundschau, Stuttgarter Zeitung, Abendzeitung, Donaukurier, Hessische Neidersächsiche Allgemeine, Kreiszeitung, Heise Online, Augsburger Allgemeine, SPOX, Nordbayern, Offenbach Post Online, inFranken, Westfälischer Anzeiger, Tagesschau, Nordkurier, Wallstreet online, Computer Bild, Die Rheinlandpfalz, Morgenweb, Bunte, Sport1, LR-Online, Gala, Wirtschaftswoche, Chip, Brigitte, NWZ Online.*
109
 
@@ -154,6 +154,7 @@ Carbon emissions were estimated using the [Machine Learning Impact calculator](h
154
 
155
  # Glossary
156
 
 
157
  **News Document**, plain text form of a news article or news item.
158
  **Snippet**, a small section of text that is related to a news document.
159
  **Title** aka headline. A few words that reflect the essence of the news story.
 
41
  The model is best at what it was pretrained for however, which is generating texts from a prompt.
42
  A prompt is a piece of text inserted in the input examples, so that the original task can be formulated as a (masked) language modeling problem.
43
 
44
+ To fit the model to the domain of german news for the downstream task of title and teaser generation it was finetuned on a dataset with 10,000 german news articles in a multi-task finetuning fashion.
45
+ Hence the finetuned models name drives from the model it was finetuned from (gptj), the downstream generation tasks (title, teaser) and the size of the finetuning dataset (10k).
46
 
47
  - **Developed by:** snipaid
48
  - **Model type:** gptj
 
103
 
104
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
105
 
106
+ The model was finetuned on a collection of 10,000 news items scraped from different online news outlets* in german language.
107
 
108
  \* *Namely: Speedweek, n-tv, Welt, Tagesspiegel, Faz, Merkur, Bild, Focus, Rp-Online, Freie Presse, Weser-Kurier, Tz, Stern, Kicker, Taz, Schwäbische Zeitung, Frankfurter Rundschau, Stuttgarter Zeitung, Abendzeitung, Donaukurier, Hessische Neidersächsiche Allgemeine, Kreiszeitung, Heise Online, Augsburger Allgemeine, SPOX, Nordbayern, Offenbach Post Online, inFranken, Westfälischer Anzeiger, Tagesschau, Nordkurier, Wallstreet online, Computer Bild, Die Rheinlandpfalz, Morgenweb, Bunte, Sport1, LR-Online, Gala, Wirtschaftswoche, Chip, Brigitte, NWZ Online.*
109
 
 
154
 
155
  # Glossary
156
 
157
+ **News Item**, aka news article. A particular piece of news, usually from a journalistic source.
158
  **News Document**, plain text form of a news article or news item.
159
  **Snippet**, a small section of text that is related to a news document.
160
  **Title** aka headline. A few words that reflect the essence of the news story.