eliasalbouzidi commited on
Commit
2b27b16
1 Parent(s): 3d32e54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -36,7 +36,9 @@ The model was trained using a dataset containing 190,000 labeled text samples, d
36
 
37
  The model is based on the Distilbert-base model.
38
 
39
- In terms of performance, the model has achieved a score of 0.988 for F1.
 
 
40
  ### Model Description
41
 
42
  The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
@@ -46,7 +48,7 @@ The model can be used directly to classify text into one of the two classes. It
46
  - **Developed by:** Centrale Supélec Students
47
  - **Model type:** 60M
48
  - **Language(s) (NLP):** English
49
- - **License:** MIT
50
 
51
 
52
  ### Training Procedure
@@ -61,18 +63,15 @@ The training data for finetuning the text classification model consists of a lar
61
 
62
  90,000 examples labeled as "nsfw"
63
 
64
- The data was preprocessed to remove stop words and punctuation, and to convert all text to lowercase.
65
 
66
  After fine-tuning the DistilBERT-base model on this dataset, transfer learning was applied using a smaller dataset. For transfer learning, the original layers of the fine-tuned DistilBERT model were frozen, and only the classification layers were fine-tuned on an additional dataset containing 40,000 examples.
67
 
68
- More information about the training data can be found in the Dataset Card (availabe soon).
69
-
70
  ## Uses
71
 
72
  The model can be integrated into larger systems for content moderation or filtering.
73
 
74
 
75
-
76
  ### Out-of-Scope Use
77
 
78
  It should not be used for any illegal activities.
@@ -96,6 +95,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
96
  tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
97
 
98
  model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
 
99
  ```
100
  ### Use a pipeline
101
  ```python
 
36
 
37
  The model is based on the Distilbert-base model.
38
 
39
+ In terms of performance, the model has achieved a score of 0.97 for F1.
40
+
41
+ To improve the performance of the model, it is necessary to preprocess the input text. You can refer to the preprocess function in the app.py file in the following space: <https://huggingface.co/spaces/eliasalbouzidi/distilbert-nsfw-text-classifier>.
42
  ### Model Description
43
 
44
  The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
 
48
  - **Developed by:** Centrale Supélec Students
49
  - **Model type:** 60M
50
  - **Language(s) (NLP):** English
51
+ - **License:** apache-2.0
52
 
53
 
54
  ### Training Procedure
 
63
 
64
  90,000 examples labeled as "nsfw"
65
 
66
+ The data was preprocessed for example to remove numbers, punctuation, urls ...
67
 
68
  After fine-tuning the DistilBERT-base model on this dataset, transfer learning was applied using a smaller dataset. For transfer learning, the original layers of the fine-tuned DistilBERT model were frozen, and only the classification layers were fine-tuned on an additional dataset containing 40,000 examples.
69
 
 
 
70
  ## Uses
71
 
72
  The model can be integrated into larger systems for content moderation or filtering.
73
 
74
 
 
75
  ### Out-of-Scope Use
76
 
77
  It should not be used for any illegal activities.
 
95
  tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
96
 
97
  model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
98
+
99
  ```
100
  ### Use a pipeline
101
  ```python