eliasalbouzidi
commited on
Commit
•
2b27b16
1
Parent(s):
3d32e54
Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,9 @@ The model was trained using a dataset containing 190,000 labeled text samples, d
|
|
36 |
|
37 |
The model is based on the Distilbert-base model.
|
38 |
|
39 |
-
In terms of performance, the model has achieved a score of 0.
|
|
|
|
|
40 |
### Model Description
|
41 |
|
42 |
The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
|
@@ -46,7 +48,7 @@ The model can be used directly to classify text into one of the two classes. It
|
|
46 |
- **Developed by:** Centrale Supélec Students
|
47 |
- **Model type:** 60M
|
48 |
- **Language(s) (NLP):** English
|
49 |
-
- **License:**
|
50 |
|
51 |
|
52 |
### Training Procedure
|
@@ -61,18 +63,15 @@ The training data for finetuning the text classification model consists of a lar
|
|
61 |
|
62 |
90,000 examples labeled as "nsfw"
|
63 |
|
64 |
-
The data was preprocessed to remove
|
65 |
|
66 |
After fine-tuning the DistilBERT-base model on this dataset, transfer learning was applied using a smaller dataset. For transfer learning, the original layers of the fine-tuned DistilBERT model were frozen, and only the classification layers were fine-tuned on an additional dataset containing 40,000 examples.
|
67 |
|
68 |
-
More information about the training data can be found in the Dataset Card (availabe soon).
|
69 |
-
|
70 |
## Uses
|
71 |
|
72 |
The model can be integrated into larger systems for content moderation or filtering.
|
73 |
|
74 |
|
75 |
-
|
76 |
### Out-of-Scope Use
|
77 |
|
78 |
It should not be used for any illegal activities.
|
@@ -96,6 +95,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
96 |
tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
|
97 |
|
98 |
model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
|
|
|
99 |
```
|
100 |
### Use a pipeline
|
101 |
```python
|
|
|
36 |
|
37 |
The model is based on the Distilbert-base model.
|
38 |
|
39 |
+
In terms of performance, the model has achieved a score of 0.97 for F1.
|
40 |
+
|
41 |
+
To improve the performance of the model, it is necessary to preprocess the input text. You can refer to the preprocess function in the app.py file in the following space: <https://huggingface.co/spaces/eliasalbouzidi/distilbert-nsfw-text-classifier>.
|
42 |
### Model Description
|
43 |
|
44 |
The model can be used directly to classify text into one of the two classes. It takes in a string of text as input and outputs a probability distribution over the two classes. The class with the highest probability is selected as the predicted class.
|
|
|
48 |
- **Developed by:** Centrale Supélec Students
|
49 |
- **Model type:** 60M
|
50 |
- **Language(s) (NLP):** English
|
51 |
+
- **License:** apache-2.0
|
52 |
|
53 |
|
54 |
### Training Procedure
|
|
|
63 |
|
64 |
90,000 examples labeled as "nsfw"
|
65 |
|
66 |
+
The data was preprocessed for example to remove numbers, punctuation, urls ...
|
67 |
|
68 |
After fine-tuning the DistilBERT-base model on this dataset, transfer learning was applied using a smaller dataset. For transfer learning, the original layers of the fine-tuned DistilBERT model were frozen, and only the classification layers were fine-tuned on an additional dataset containing 40,000 examples.
|
69 |
|
|
|
|
|
70 |
## Uses
|
71 |
|
72 |
The model can be integrated into larger systems for content moderation or filtering.
|
73 |
|
74 |
|
|
|
75 |
### Out-of-Scope Use
|
76 |
|
77 |
It should not be used for any illegal activities.
|
|
|
95 |
tokenizer = AutoTokenizer.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
|
96 |
|
97 |
model = AutoModelForSequenceClassification.from_pretrained("eliasalbouzidi/distilbert-nsfw-text-classifier")
|
98 |
+
|
99 |
```
|
100 |
### Use a pipeline
|
101 |
```python
|