EIStakovskii
/

german_toxicity_classifier_plus_v2

Text Classification

Inference Endpoints

Model card Files Files and versions Community

german_toxicity_classifier_plus_v2 / README.md

EIStakovskii's picture

Update README.md

6c8e91a about 2 years ago

|

2.75 kB

	---
	language: de # <-- my language
	widget:
	- text: "Guten morgen, meine Liebe"
	example_title: "NOT TOXIC 1"
	- text: "Ich scheiß drauf."
	example_title: "TOXIC 1"
	- text: "Ich liebe dich"
	example_title: "NOT TOXIC 2"
	- text: "Ich hab die Schnauze voll von diesen Irren."
	example_title: "TOXIC 2"
	- text: "Ich wünsche Ihnen einen schönen Tag!"
	example_title: "NOT TOXIC 3"
	- text: "Nigger"
	example_title: "TOXIC 3"
	- text: "Du bist schon wieder zu spät!"
	example_title: "NOT TOXIC 4"
	- text: "Beweg deinen AArschhh hier rüber"
	example_title: "TOXIC 4"

	license: other
	---
	## Description
	NB: this version of the model is the improved version of [EIStakovskii/german_toxicity_classifier_plus](https://huggingface.co/EIStakovskii/german_toxicity_classifier_plus).
	To see the source code of training and the data please follow [the github link](https://github.com/eistakovskii/NLP_projects/tree/main/TEXT_CLASSIFICATION).

	This model was trained for toxicity labeling.

	The model was fine-tuned based off [the dbmdz/bert-base-german-cased model](https://huggingface.co/dbmdz/bert-base-german-cased).

	To use the model:

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification", model = 'EIStakovskii/german_toxicity_classifier_plus_v2')

	print(classifier("Verpiss dich von hier"))

	```

	## Metrics (at validation):

	epoch\|step\|eval_accuracy\|eval_f1\|eval_loss
	-\|-\|-\|-\|-
	0.8\|1200\|0.9132176234979973\|0.9113535629048755\|0.24135465919971466

	## Comparison against Perspective

	This model was compared against the Google's [Perspective API](https://developers.perspectiveapi.com/s/?language=en_US) that similarly detects toxicity.
	Two models were tested on two datasets: the size of [200 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_200.csv) and [400 sentences](https://github.com/eistakovskii/NLP_projects/blob/main/TEXT_CLASSIFICATION/data/Toxicity_Classifiers/DE_FR/test/test_de_400.csv).
	The first one (arguably harder) was collected from the sentences of the [JigSaw](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data) and [DeTox](https://github.com/hdaSprachtechnologie/detox) datasets.
	The second one (easier) was collected from the combination of sources: both from JigSaw and DeTox as well as [Paradetox](https://github.com/s-nlp/multilingual_detox/tree/main/data) translations and sentences extracted from [Reverso Context](https://context.reverso.net/translation/) by keywords.

	# german_toxicity_classifier_plus_v2
	size\|accuracy\|f1
	-\|-\|-
	200\|0.767\|0.787
	400\|0.9650\|0.9651

	# Perspective
	size\|accuracy\|f1
	-\|-\|-
	200\|0.834\|0.820
	400\|0.892\|0.885