--- datasets: - google/jigsaw_toxicity_pred language: - en base_model: - FacebookAI/roberta-base pipeline_tag: text-classification --- # Model Card for Toxicity Detection Model ## Model Details ### Model Description This model is fine-tuned to detect various types of toxicity in text comments. It was trained on a dataset of labeled Wikipedia comments where each comment is classified into one or more categories of toxicity. The model predicts the probability of each type of toxicity for a given text input. ### Developed by: Louis Fournier, Enzo Medrinal, Christian Doan, and Clément Barbier ### Funded by [optional]: [More Information Needed] ### Shared by [optional]: [More Information Needed] ### Model type: Language Model (RoBERTa-based) ### Language(s) (NLP): English ### License: [More Information Needed] ### Finetuned from model: `roberta-base` ### Model Sources [optional]: - Repository: [temporary_link_to_repo] - Paper: [More Information Needed] - Demo: [More Information Needed] --- ## Uses ### Direct Use: This model can be directly used for the classification of toxic comments. It predicts the probabilities for each of the following types of toxicity: - Toxic - Severe Toxic - Obscene - Threat - Insult - Identity Hate ### Downstream Use [optional]: The model can be integrated into applications that aim to moderate or filter toxic content in user-generated text, such as: - Online comment sections - Social media platforms - Customer feedback systems ### Out-of-Scope Use: This model is not intended for use in: - Detecting general sentiments (e.g., positive, negative, neutral). - Predicting toxicity in languages other than English. --- ## Bias, Risks, and Limitations The model may exhibit biases in its predictions based on the language and topics present in the training data. It has been trained on Wikipedia comments, which may not fully represent the diversity of online discourse. The model may also struggle with: - **Overfitting** to specific types of toxicity or language use found in the training data. - **False positives/negatives** in detecting toxicity, particularly in ambiguous cases. --- ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. It is recommended to review model outputs in context and combine the model with human moderation for high-stakes applications. --- ## How to Get Started with the Model Use the following code to get started with the model: ```python from transformers import RobertaTokenizer, RobertaForSequenceClassification import torch # Load the pre-trained model and tokenizer model = RobertaForSequenceClassification.from_pretrained('path_to_finetuned_model') tokenizer = RobertaTokenizer.from_pretrained('roberta-base') # Example input text = "This is a comment example" # Tokenize the input text inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512) # Get model predictions with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probabilities = torch.sigmoid(logits) # Print the probabilities for each toxicity type print(probabilities)