EIStakovskii commited on
Commit
ab9d2b6
·
1 Parent(s): 780df4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -32,4 +32,23 @@ The model was fine-tuned based off [the dbmdz/bert-base-german-cased model](http
32
 
33
  epoch|step|eval_accuracy|eval_f1|eval_loss
34
  -|-|-|-|-
35
- 0.8|1200|0.9132176234979973|0.9113535629048755|0.24135465919971466
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  epoch|step|eval_accuracy|eval_f1|eval_loss
34
  -|-|-|-|-
35
+ 0.8|1200|0.9132176234979973|0.9113535629048755|0.24135465919971466
36
+
37
+ ## Comparison against Perspective
38
+
39
+ This model was compared against the Google's [Perspective API](https://developers.perspectiveapi.com/s/?language=en_US) that similarly detects toxicity.
40
+ Two models were tested on two datasets: the size of [200 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_200.csv) and [400 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_400.csv).
41
+ The first one (arguably harder) was collected from the sentences of the [JigSaw](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data) and [DeTox](https://github.com/hdaSprachtechnologie/detox) datasets.
42
+ The second one (easier) was collected from the combination of sources: both from JigSaw and DeTox as well as [Paradetox](https://github.com/s-nlp/multilingual_detox/tree/main/data) translations and sentences extracted from [Reverso Context](https://context.reverso.net/translation/) by keywords.
43
+
44
+ # german_toxicity_classifier_plus_v2
45
+ size|accuracy|f1
46
+ -|-|-
47
+ 200|0.767|0.787
48
+ 400|0.9650|0.9651
49
+
50
+ # Perspective
51
+ size|accuracy|f1
52
+ -|-|-
53
+ 200|0.834|0.820
54
+ 400|0.892|0.885