language: de
widget:
- text: Guten morgen, meine Liebe
example_title: NOT TOXIC 1
- text: Ich scheiß drauf.
example_title: TOXIC 1
- text: Ich liebe dich
example_title: NOT TOXIC 2
- text: Ich hab die Schnauze voll von diesen Irren.
example_title: TOXIC 2
- text: Ich wünsche Ihnen einen schönen Tag!
example_title: NOT TOXIC 3
- text: Nigger
example_title: TOXIC 3
- text: Du bist schon wieder zu spät!
example_title: NOT TOXIC 4
- text: Beweg deinen AArschhh hier rüber
example_title: TOXIC 4
license: other
Description
NB: this version of the model is the improved version of EIStakovskii/german_toxicity_classifier_plus. To see the source code of training and the data please follow the github link.
This model was trained for toxicity labeling.
The model was fine-tuned based off the dbmdz/bert-base-german-cased model.
To use the model:
from transformers import pipeline
classifier = pipeline("text-classification", model = 'EIStakovskii/german_toxicity_classifier_plus_v2')
print(classifier("Verpiss dich von hier"))
Metrics (at validation):
epoch | step | eval_accuracy | eval_f1 | eval_loss |
---|---|---|---|---|
0.8 | 1200 | 0.9132176234979973 | 0.9113535629048755 | 0.24135465919971466 |
Comparison against Perspective
This model was compared against the Google's Perspective API that similarly detects toxicity. Two models were tested on two datasets: the size of 200 sentences and 400 sentences. The first one (arguably harder) was collected from the sentences of the JigSaw and DeTox datasets. The second one (easier) was collected from the combination of sources: both from JigSaw and DeTox as well as Paradetox translations and sentences extracted from Reverso Context by keywords.
german_toxicity_classifier_plus_v2
size | accuracy | f1 |
---|---|---|
200 | 0.767 | 0.787 |
400 | 0.9650 | 0.9651 |
Perspective
size | accuracy | f1 |
---|---|---|
200 | 0.834 | 0.820 |
400 | 0.892 | 0.885 |