Model Card for Toxicity Detection Model
Model Details
Model Description
This model is fine-tuned to detect various types of toxicity in text comments. It was trained on a dataset of labeled Wikipedia comments where each comment is classified into one or more categories of toxicity. The model predicts the probability of each type of toxicity for a given text input.
Developed by:
Louis Fournier, Enzo Medrinal, Christian Doan, and Clément Barbier
Funded by [optional]:
[More Information Needed]
Shared by [optional]:
[More Information Needed]
Model type:
Language Model (RoBERTa-based)
Language(s) (NLP):
English
License:
[More Information Needed]
Finetuned from model:
roberta-base
Model Sources [optional]:
- Repository: [temporary_link_to_repo]
- Paper: [More Information Needed]
- Demo: [More Information Needed]
Uses
Direct Use:
This model can be directly used for the classification of toxic comments. It predicts the probabilities for each of the following types of toxicity:
- Toxic
- Severe Toxic
- Obscene
- Threat
- Insult
- Identity Hate
Downstream Use [optional]:
The model can be integrated into applications that aim to moderate or filter toxic content in user-generated text, such as:
- Online comment sections
- Social media platforms
- Customer feedback systems
Out-of-Scope Use:
This model is not intended for use in:
- Detecting general sentiments (e.g., positive, negative, neutral).
- Predicting toxicity in languages other than English.
Bias, Risks, and Limitations
The model may exhibit biases in its predictions based on the language and topics present in the training data. It has been trained on Wikipedia comments, which may not fully represent the diversity of online discourse. The model may also struggle with:
- Overfitting to specific types of toxicity or language use found in the training data.
- False positives/negatives in detecting toxicity, particularly in ambiguous cases.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. It is recommended to review model outputs in context and combine the model with human moderation for high-stakes applications.
How to Get Started with the Model
Use the following code to get started with the model:
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch
# Load the pre-trained model and tokenizer
model = RobertaForSequenceClassification.from_pretrained('path_to_finetuned_model')
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
# Example input
text = "This is a comment example"
# Tokenize the input text
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
# Get model predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.sigmoid(logits)
# Print the probabilities for each toxicity type
print(probabilities)
Model tree for lfournier/ToxicityClassifier-RoBERTa
Base model
FacebookAI/roberta-base