EIStakovskii
/

german_toxicity_classifier_plus_v2

Text Classification

Inference Endpoints

Model card Files Files and versions Community

EIStakovskii commited on Jan 26, 2023

Commit

ab9d2b6

·

1 Parent(s): 780df4a

Update README.md

Files changed (1) hide show

README.md +20 -1

README.md CHANGED Viewed

@@ -32,4 +32,23 @@ The model was fine-tuned based off [the dbmdz/bert-base-german-cased model](http
 epoch|step|eval_accuracy|eval_f1|eval_loss
 -|-|-|-|-
-0.8|1200|0.9132176234979973|0.9113535629048755|0.24135465919971466

 epoch|step|eval_accuracy|eval_f1|eval_loss
 -|-|-|-|-
+0.8|1200|0.9132176234979973|0.9113535629048755|0.24135465919971466
+## Comparison against Perspective
+This model was compared against the Google's [Perspective API](https://developers.perspectiveapi.com/s/?language=en_US) that similarly detects toxicity.
+Two models were tested on two datasets: the size of [200 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_200.csv) and [400 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_400.csv).
+The first one (arguably harder) was collected from the sentences of the [JigSaw](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data) and [DeTox](https://github.com/hdaSprachtechnologie/detox) datasets.
+The second one (easier) was collected from the combination of sources: both from JigSaw and DeTox as well as [Paradetox](https://github.com/s-nlp/multilingual_detox/tree/main/data) translations and sentences extracted from [Reverso Context](https://context.reverso.net/translation/) by keywords.
+# german_toxicity_classifier_plus_v2
+size|accuracy|f1
+-|-|-
+200|0.767|0.787
+400|0.9650|0.9651
+# Perspective
+size|accuracy|f1
+-|-|-
+200|0.834|0.820
+400|0.892|0.885