Model Card for Toxicity Detection Model

Model Details

Model Description

This model is fine-tuned to detect various types of toxicity in text comments. It was trained on a dataset of labeled Wikipedia comments where each comment is classified into one or more categories of toxicity. The model predicts the probability of each type of toxicity for a given text input.

Developed by:

Louis Fournier, Enzo Medrinal, Christian Doan, and Clément Barbier

Funded by [optional]:

[More Information Needed]

Shared by [optional]:

[More Information Needed]

Model type:

Language Model (RoBERTa-based)

Language(s) (NLP):

English

License:

[More Information Needed]

Finetuned from model:

roberta-base

Model Sources [optional]:

Repository: [temporary_link_to_repo]
Paper: [More Information Needed]
Demo: [More Information Needed]

Uses

Direct Use:

This model can be directly used for the classification of toxic comments. It predicts the probabilities for each of the following types of toxicity:

Toxic
Severe Toxic
Obscene
Threat
Insult
Identity Hate

Downstream Use [optional]:

The model can be integrated into applications that aim to moderate or filter toxic content in user-generated text, such as:

Online comment sections
Social media platforms
Customer feedback systems

Out-of-Scope Use:

This model is not intended for use in:

Detecting general sentiments (e.g., positive, negative, neutral).
Predicting toxicity in languages other than English.

Bias, Risks, and Limitations

The model may exhibit biases in its predictions based on the language and topics present in the training data. It has been trained on Wikipedia comments, which may not fully represent the diversity of online discourse. The model may also struggle with:

Overfitting to specific types of toxicity or language use found in the training data.
False positives/negatives in detecting toxicity, particularly in ambiguous cases.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. It is recommended to review model outputs in context and combine the model with human moderation for high-stakes applications.

How to Get Started with the Model

Use the following code to get started with the model:

from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch

# Load the pre-trained model and tokenizer
model = RobertaForSequenceClassification.from_pretrained('path_to_finetuned_model')
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

# Example input
text = "This is a comment example"

# Tokenize the input text
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)

# Get model predictions
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probabilities = torch.sigmoid(logits)

# Print the probabilities for each toxicity type
print(probabilities)

lfournier
/

ToxicityClassifier-RoBERTa

Model Card for Toxicity Detection Model

Model Details

Model Description

Developed by:

Funded by [optional]:

Shared by [optional]:

Model type:

Language(s) (NLP):

License:

Finetuned from model:

Model Sources [optional]:

Uses

Direct Use:

Downstream Use [optional]:

Out-of-Scope Use:

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Model tree for lfournier/ToxicityClassifier-RoBERTa

Dataset used to train lfournier/ToxicityClassifier-RoBERTa