Commit
·
0a5c276
1
Parent(s):
6123070
model card updated
Browse files
README.md
CHANGED
@@ -21,3 +21,38 @@ widget:
|
|
21 |
metrics:
|
22 |
- f1
|
23 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
metrics:
|
22 |
- f1
|
23 |
---
|
24 |
+
|
25 |
+
This model predicts the punctuation of plain text written in English, Italian, French and German. We developed it to restore the punctuation of transcribed spoken language.
|
26 |
+
|
27 |
+
This multilanguage model was trained on the [Europarl Dataset](https://huggingface.co/datasets/wmt/europarl) provided by the [SEPP-NLG Shared Task](https://sites.google.com/view/sentence-segmentation). *Please note that this dataset consists of political speeches. Therefore the model might perform differently on texts from other domains.*
|
28 |
+
|
29 |
+
The model restores the following punctuation markers: **"." "," "?" "-" ":"**
|
30 |
+
|
31 |
+
## Results
|
32 |
+
|
33 |
+
The model achieves the following F1 scores for the different languages. The performance differs for the single punctuation markers. Hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop.
|
34 |
+
|
35 |
+
| Label | EN | DE | FR | IT |
|
36 |
+
| ------------- | ----- | ----- | ----- | ----- |
|
37 |
+
| 0 | 0.991 | 0.997 | 0.992 | 0.989 |
|
38 |
+
| . | 0.948 | 0.961 | 0.945 | 0.942 |
|
39 |
+
| ? | 0.890 | 0.893 | 0.871 | 0.832 |
|
40 |
+
| , | 0.819 | 0.945 | 0.831 | 0.798 |
|
41 |
+
| : | 0.575 | 0.652 | 0.620 | 0.588 |
|
42 |
+
| - | 0.425 | 0.435 | 0.431 | 0.421 |
|
43 |
+
| macro average | 0.775 | 0.814 | 0.782 | 0.762 |
|
44 |
+
|
45 |
+
|
46 |
+
## References
|
47 |
+
|
48 |
+
@article{guhr-EtAl:2021:fullstop,
|
49 |
+
title={FullStop: Multilingual Deep Models for Punctuation Prediction},
|
50 |
+
author = {Guhr, Oliver and Schumann, Anne-Kathrin and Bahrmann, Frank and Böhme, Hans Joachim},
|
51 |
+
booktitle = {Proceedings of the Swiss Text Analytics Conference 2021},
|
52 |
+
month = {June},
|
53 |
+
year = {2021},
|
54 |
+
address = {Winterthur, Switzerland},
|
55 |
+
publisher = {CEUR Workshop Proceedings},
|
56 |
+
url = {http://ceur-ws.org/Vol-2957/sepp_paper4.pdf}
|
57 |
+
}
|
58 |
+
|