eliasalbouzidi
/

distilbert-nsfw-text-classifier

@@ -36,7 +36,9 @@ The model was trained using a dataset containing 190,000 labeled text samples, d
 The model is based on the Distilbert-base model.
-In terms of performance, the model has achieved a score of 0.988 for F1.
 ### Model Description
 The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
@@ -46,7 +48,7 @@ The model can be used directly to classify text into one of the two classes. It
 - **Developed by:** Centrale Supélec Students
 - **Model type:** 60M
 - **Language(s) (NLP):** English
-- **License:** MIT
 ### Training Procedure
@@ -61,18 +63,15 @@ The training data for finetuning the text classification model consists of a lar
 90,000 examples labeled as "nsfw"
-The data was preprocessed to remove stop words and punctuation, and to convert all text to lowercase.
 After fine-tuning the DistilBERT-base model on this dataset, transfer learning was applied using a smaller dataset. For transfer learning, the original layers of the fine-tuned DistilBERT model were frozen, and only the classification layers were fine-tuned on an additional dataset containing 40,000 examples.
-More information about the training data can be found in the Dataset Card (availabe soon).
 ## Uses
 The model can be integrated into larger systems for content moderation or filtering.
 ### Out-of-Scope Use
 It should not be used for any illegal activities.
@@ -96,6 +95,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
 tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
 model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
 ```
 ### Use a pipeline
 ```python

 The model is based on the Distilbert-base model.
+In terms of performance, the model has achieved a score of 0.97 for F1.
+To improve the performance of the model, it is necessary to preprocess the input text. You can refer to the preprocess function in the app.py file in the following space: <https://huggingface.co/spaces/eliasalbouzidi/distilbert-nsfw-text-classifier>.
 ### Model Description
 The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
 - **Developed by:** Centrale Supélec Students
 - **Model type:** 60M
 - **Language(s) (NLP):** English
+- **License:** apache-2.0
 ### Training Procedure
 90,000 examples labeled as "nsfw"
+The data was preprocessed for example to remove numbers, punctuation, urls ...
 After fine-tuning the DistilBERT-base model on this dataset, transfer learning was applied using a smaller dataset. For transfer learning, the original layers of the fine-tuned DistilBERT model were frozen, and only the classification layers were fine-tuned on an additional dataset containing 40,000 examples.
 ## Uses
 The model can be integrated into larger systems for content moderation or filtering.
 ### Out-of-Scope Use
 It should not be used for any illegal activities.
 tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
 model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
 ```
 ### Use a pipeline
 ```python