eliasalbouzidi
commited on
Commit
•
a3a4f0a
1
Parent(s):
c6ac130
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ widget:
|
|
12 |
example_title: Nsfw
|
13 |
- text: A mass shooting
|
14 |
example_title: Nsfw
|
15 |
-
base_model: distilbert-base-uncased
|
16 |
license: apache-2.0
|
17 |
language:
|
18 |
- en
|
@@ -28,6 +28,24 @@ tags:
|
|
28 |
- safety
|
29 |
- innapropriate
|
30 |
- distilbert
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
---
|
32 |
|
33 |
# Model Card
|
@@ -53,6 +71,9 @@ The model can be used directly to classify text into one of the two classes. It
|
|
53 |
- **Language(s) (NLP):** English
|
54 |
- **License:** apache-2.0
|
55 |
|
|
|
|
|
|
|
56 |
### Training Data
|
57 |
The training data for finetuning the text classification model consists of a large corpus of text labeled with one of the two classes: "safe" and "nsfw". The dataset contains a total of 190,000 examples, which are distributed as follows:
|
58 |
|
@@ -62,6 +83,7 @@ The training data for finetuning the text classification model consists of a lar
|
|
62 |
|
63 |
It was assembled by scraping data from the web and utilizing existing open-source datasets. A significant portion of the dataset consists of descriptions for images and scenes. The primary objective was to prevent diffusers from generating NSFW content but it can be used for other moderation purposes.
|
64 |
|
|
|
65 |
### Training hyperparameters
|
66 |
|
67 |
The following hyperparameters were used during training:
|
@@ -120,11 +142,6 @@ We selected the checkpoint with the highest F-beta1.6 score.
|
|
120 |
- Tokenizers 0.19.1
|
121 |
|
122 |
|
123 |
-
## Uses
|
124 |
-
|
125 |
-
The model can be integrated into larger systems for content moderation or filtering.
|
126 |
-
|
127 |
-
|
128 |
### Out-of-Scope Use
|
129 |
|
130 |
It should not be used for any illegal activities.
|
|
|
12 |
example_title: Nsfw
|
13 |
- text: A mass shooting
|
14 |
example_title: Nsfw
|
15 |
+
base_model: distilbert-base-uncased
|
16 |
license: apache-2.0
|
17 |
language:
|
18 |
- en
|
|
|
28 |
- safety
|
29 |
- innapropriate
|
30 |
- distilbert
|
31 |
+
datasets:
|
32 |
+
- eliasalbouzidi/NSFW-Safe-Dataset
|
33 |
+
model-index:
|
34 |
+
- name: NSFW-Safe-Dataset
|
35 |
+
results:
|
36 |
+
- task:
|
37 |
+
name: Text Classification
|
38 |
+
type: text-classification
|
39 |
+
dataset:
|
40 |
+
name: NSFW-Safe-Dataset
|
41 |
+
type: .
|
42 |
+
metrics:
|
43 |
+
- name: F1
|
44 |
+
type: f1
|
45 |
+
value: 0.974
|
46 |
+
- name: Accuracy
|
47 |
+
type: accuracy
|
48 |
+
value: 0.98
|
49 |
---
|
50 |
|
51 |
# Model Card
|
|
|
71 |
- **Language(s) (NLP):** English
|
72 |
- **License:** apache-2.0
|
73 |
|
74 |
+
### Uses
|
75 |
+
|
76 |
+
The model can be integrated into larger systems for content moderation or filtering.
|
77 |
### Training Data
|
78 |
The training data for finetuning the text classification model consists of a large corpus of text labeled with one of the two classes: "safe" and "nsfw". The dataset contains a total of 190,000 examples, which are distributed as follows:
|
79 |
|
|
|
83 |
|
84 |
It was assembled by scraping data from the web and utilizing existing open-source datasets. A significant portion of the dataset consists of descriptions for images and scenes. The primary objective was to prevent diffusers from generating NSFW content but it can be used for other moderation purposes.
|
85 |
|
86 |
+
You can access the dataset : https://huggingface.co/datasets/eliasalbouzidi/NSFW-Safe-Dataset
|
87 |
### Training hyperparameters
|
88 |
|
89 |
The following hyperparameters were used during training:
|
|
|
142 |
- Tokenizers 0.19.1
|
143 |
|
144 |
|
|
|
|
|
|
|
|
|
|
|
145 |
### Out-of-Scope Use
|
146 |
|
147 |
It should not be used for any illegal activities.
|