oliverguhr commited on
Commit
0a5c276
·
1 Parent(s): 6123070

model card updated

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -21,3 +21,38 @@ widget:
21
  metrics:
22
  - f1
23
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  metrics:
22
  - f1
23
  ---
24
+
25
+ This model predicts the punctuation of plain text written in English, Italian, French and German. We developed it to restore the punctuation of transcribed spoken language.
26
+
27
+ This multilanguage model was trained on the [Europarl Dataset](https://huggingface.co/datasets/wmt/europarl) provided by the [SEPP-NLG Shared Task](https://sites.google.com/view/sentence-segmentation). *Please note that this dataset consists of political speeches. Therefore the model might perform differently on texts from other domains.*
28
+
29
+ The model restores the following punctuation markers: **"." "," "?" "-" ":"**
30
+
31
+ ## Results
32
+
33
+ The model achieves the following F1 scores for the different languages. The performance differs for the single punctuation markers. Hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop.
34
+
35
+ | Label | EN | DE | FR | IT |
36
+ | ------------- | ----- | ----- | ----- | ----- |
37
+ | 0 | 0.991 | 0.997 | 0.992 | 0.989 |
38
+ | . | 0.948 | 0.961 | 0.945 | 0.942 |
39
+ | ? | 0.890 | 0.893 | 0.871 | 0.832 |
40
+ | , | 0.819 | 0.945 | 0.831 | 0.798 |
41
+ | : | 0.575 | 0.652 | 0.620 | 0.588 |
42
+ | - | 0.425 | 0.435 | 0.431 | 0.421 |
43
+ | macro average | 0.775 | 0.814 | 0.782 | 0.762 |
44
+
45
+
46
+ ## References
47
+
48
+ @article{guhr-EtAl:2021:fullstop,
49
+ title={FullStop: Multilingual Deep Models for Punctuation Prediction},
50
+ author = {Guhr, Oliver and Schumann, Anne-Kathrin and Bahrmann, Frank and Böhme, Hans Joachim},
51
+ booktitle = {Proceedings of the Swiss Text Analytics Conference 2021},
52
+ month = {June},
53
+ year = {2021},
54
+ address = {Winterthur, Switzerland},
55
+ publisher = {CEUR Workshop Proceedings},
56
+ url = {http://ceur-ws.org/Vol-2957/sepp_paper4.pdf}
57
+ }
58
+