lfournier commited on
Commit
4034005
·
verified ·
1 Parent(s): 7c86bd4

Create Readme.md

Browse files
Files changed (1) hide show
  1. README.md +108 -0
README.md ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - google/jigsaw_toxicity_pred
4
+ language:
5
+ - en
6
+ base_model:
7
+ - FacebookAI/roberta-base
8
+ pipeline_tag: text-classification
9
+ ---
10
+ # Model Card for Toxicity Detection Model
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+ This model is fine-tuned to detect various types of toxicity in text comments. It was trained on a dataset of labeled Wikipedia comments where each comment is classified into one or more categories of toxicity. The model predicts the probability of each type of toxicity for a given text input.
16
+
17
+ ### Developed by:
18
+ Louis Fournier, Enzo Medrinal, Christian Doan, and Clément Barbier
19
+
20
+ ### Funded by [optional]:
21
+ [More Information Needed]
22
+
23
+ ### Shared by [optional]:
24
+ [More Information Needed]
25
+
26
+ ### Model type:
27
+ Language Model (RoBERTa-based)
28
+
29
+ ### Language(s) (NLP):
30
+ English
31
+
32
+ ### License:
33
+ [More Information Needed]
34
+
35
+ ### Finetuned from model:
36
+ `roberta-base`
37
+
38
+ ### Model Sources [optional]:
39
+ - Repository: [temporary_link_to_repo]
40
+ - Paper: [More Information Needed]
41
+ - Demo: [More Information Needed]
42
+
43
+ ---
44
+
45
+ ## Uses
46
+
47
+ ### Direct Use:
48
+ This model can be directly used for the classification of toxic comments. It predicts the probabilities for each of the following types of toxicity:
49
+ - Toxic
50
+ - Severe Toxic
51
+ - Obscene
52
+ - Threat
53
+ - Insult
54
+ - Identity Hate
55
+
56
+ ### Downstream Use [optional]:
57
+ The model can be integrated into applications that aim to moderate or filter toxic content in user-generated text, such as:
58
+ - Online comment sections
59
+ - Social media platforms
60
+ - Customer feedback systems
61
+
62
+ ### Out-of-Scope Use:
63
+ This model is not intended for use in:
64
+ - Detecting general sentiments (e.g., positive, negative, neutral).
65
+ - Predicting toxicity in languages other than English.
66
+
67
+ ---
68
+
69
+ ## Bias, Risks, and Limitations
70
+
71
+ The model may exhibit biases in its predictions based on the language and topics present in the training data. It has been trained on Wikipedia comments, which may not fully represent the diversity of online discourse. The model may also struggle with:
72
+ - **Overfitting** to specific types of toxicity or language use found in the training data.
73
+ - **False positives/negatives** in detecting toxicity, particularly in ambiguous cases.
74
+
75
+ ---
76
+
77
+ ## Recommendations
78
+
79
+ Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. It is recommended to review model outputs in context and combine the model with human moderation for high-stakes applications.
80
+
81
+ ---
82
+
83
+ ## How to Get Started with the Model
84
+
85
+ Use the following code to get started with the model:
86
+
87
+ ```python
88
+ from transformers import RobertaTokenizer, RobertaForSequenceClassification
89
+ import torch
90
+
91
+ # Load the pre-trained model and tokenizer
92
+ model = RobertaForSequenceClassification.from_pretrained('path_to_finetuned_model')
93
+ tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
94
+
95
+ # Example input
96
+ text = "This is a comment example"
97
+
98
+ # Tokenize the input text
99
+ inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
100
+
101
+ # Get model predictions
102
+ with torch.no_grad():
103
+ outputs = model(**inputs)
104
+ logits = outputs.logits
105
+ probabilities = torch.sigmoid(logits)
106
+
107
+ # Print the probabilities for each toxicity type
108
+ print(probabilities)