EIStakovskii
commited on
Commit
·
ab9d2b6
1
Parent(s):
780df4a
Update README.md
Browse files
README.md
CHANGED
@@ -32,4 +32,23 @@ The model was fine-tuned based off [the dbmdz/bert-base-german-cased model](http
|
|
32 |
|
33 |
epoch|step|eval_accuracy|eval_f1|eval_loss
|
34 |
-|-|-|-|-
|
35 |
-
0.8|1200|0.9132176234979973|0.9113535629048755|0.24135465919971466
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
epoch|step|eval_accuracy|eval_f1|eval_loss
|
34 |
-|-|-|-|-
|
35 |
+
0.8|1200|0.9132176234979973|0.9113535629048755|0.24135465919971466
|
36 |
+
|
37 |
+
## Comparison against Perspective
|
38 |
+
|
39 |
+
This model was compared against the Google's [Perspective API](https://developers.perspectiveapi.com/s/?language=en_US) that similarly detects toxicity.
|
40 |
+
Two models were tested on two datasets: the size of [200 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_200.csv) and [400 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_400.csv).
|
41 |
+
The first one (arguably harder) was collected from the sentences of the [JigSaw](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data) and [DeTox](https://github.com/hdaSprachtechnologie/detox) datasets.
|
42 |
+
The second one (easier) was collected from the combination of sources: both from JigSaw and DeTox as well as [Paradetox](https://github.com/s-nlp/multilingual_detox/tree/main/data) translations and sentences extracted from [Reverso Context](https://context.reverso.net/translation/) by keywords.
|
43 |
+
|
44 |
+
# german_toxicity_classifier_plus_v2
|
45 |
+
size|accuracy|f1
|
46 |
+
-|-|-
|
47 |
+
200|0.767|0.787
|
48 |
+
400|0.9650|0.9651
|
49 |
+
|
50 |
+
# Perspective
|
51 |
+
size|accuracy|f1
|
52 |
+
-|-|-
|
53 |
+
200|0.834|0.820
|
54 |
+
400|0.892|0.885
|