EIStakovskii commited on
Commit
4fa6ac6
·
1 Parent(s): 376faae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -2,17 +2,22 @@
2
  language: fr # <-- my language
3
  widget:
4
  - text: "J'aime ta coiffure"
 
5
  - text: "Va te faire foutre"
 
6
  - text: "Quel mauvais temps, n'est-ce pas ?"
 
7
  - text: "J'espère que tu vas mourir, connard !"
 
8
  - text: "j'aime beaucoup ta veste"
 
9
 
10
  license: other
11
  ---
12
  This model was trained for toxicity labeling. Label_1 means TOXIC, Label_0 means NOT TOXIC
13
 
14
- The model was fine-tuned based off the CamemBERT language model https://huggingface.co/camembert-base .
15
 
16
  The accuracy is 93% on the test split during training and 79% on a manually picked (and thus harder) sample of 200 sentences (100 label 1, 100 label 0) at the end of the training.
17
 
18
- The model was finetuned on 32k sentences. The train data was the translations of the english data (around 30k sentences) from https://github.com/s-nlp/multilingual_detox with https://huggingface.co/Helsinki-NLP/opus-mt-en-fr and the data from the jigsaw dataset on kaggle https://www.kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification/data .
 
2
  language: fr # <-- my language
3
  widget:
4
  - text: "J'aime ta coiffure"
5
+ example_title: "NOT TOXIC 1"
6
  - text: "Va te faire foutre"
7
+ example_title: "TOXIC 1"
8
  - text: "Quel mauvais temps, n'est-ce pas ?"
9
+ example_title: "NOT TOXIC 2"
10
  - text: "J'espère que tu vas mourir, connard !"
11
+ example_title: "TOXIC 2"
12
  - text: "j'aime beaucoup ta veste"
13
+ example_title: "NOT TOXIC 3"
14
 
15
  license: other
16
  ---
17
  This model was trained for toxicity labeling. Label_1 means TOXIC, Label_0 means NOT TOXIC
18
 
19
+ The model was fine-tuned based off [the CamemBERT language model](https://huggingface.co/camembert-base).
20
 
21
  The accuracy is 93% on the test split during training and 79% on a manually picked (and thus harder) sample of 200 sentences (100 label 1, 100 label 0) at the end of the training.
22
 
23
+ The model was finetuned on 32k sentences. The train data was the translations of the English data (around 30k sentences) from [the multilingual_detox dataset](https://github.com/s-nlp/multilingual_detox) by [Skolkovo Institute](https://huggingface.co/SkolkovoInstitute) using [the opus-mt-en-fr translation model](https://huggingface.co/Helsinki-NLP/opus-mt-en-fr) by [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) and the data from [the jigsaw dataset](https://www.kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification/data) on kaggle.