File size: 8,585 Bytes
fb44f03
 
260a2c5
 
 
 
 
 
 
 
 
47cc7a8
6274747
fb44f03
260a2c5
 
 
 
 
2d0d64e
 
 
260a2c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f2feaf
260a2c5
 
 
 
0f2feaf
0d8958e
260a2c5
 
 
 
 
 
 
 
 
c4700ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260a2c5
 
0f2feaf
260a2c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3dccfc7
260a2c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
383f276
260a2c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
---
license: apache-2.0
tags:
- Cross-lingual-nlp
- zero-shot-transfer
- toxicity-analysis
- abuse-detection
- flag-user
- block-user
- multilinguality
- XLM-R
requirements:
- sentencepiece: (if not installed install using `pip install sentencepiece`, and restart runtime)
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This model aims to help developers, especially those with little to no experience in NLP, use our model directly to flag or block users from their platforms. Moreover, since our model also knows harmful or unethical comments, it can be used to make AI models, especially when integrated with machines like Robots, provides a last layer of decision to act upon the thoughts. 
In a nutshell our model assists AI models to better understand whether a thought is ethical or moral, and wheather it should take action on it. Hence making `AI safer for all`. 
Our model aims to work with any arbitrary language, as long as it is supported by the XLM-R vector space aligner embedder model. #Abuse detection #Toxicity analysis #Obscene language detection #Harm, unethical thought detection.

Langauges supported:
- Afrikaans
- Albanian
- Amharic
- Arabic
- Armenian
- Assamese
- Azerbaijani
- Basque
- Belarusian
- Bengali
- Bhojpuri
- Bosnian
- Bulgarian
- Burmese
- Catalan
- Cebuano
- Chewa
- Chinese (Simplified)
- Chinese (Traditional)
- Chittagonian
- Corsican
- Croatian
- Czech
- Danish
- Deccan
- Dutch
- English
- Esperanto
- Estonian
- Filipino
- Finnish
- French
- Frisian
- Galician
- Georgian
- German
- Greek
- Gujarati
- Haitian Creole
- Hausa
- Hawaiian
- Hebrew
- Hindi
- Hmong
- Hungarian
- Icelandic
- Igbo
- Indonesian
- Irish
- Italian
- Japanese
- Javanese
- Kannada
- Kazakh
- Khmer
- Kinyarwanda
- Kirundi
- Korean
- Kurdish
- Kyrgyz
- Lao
- Latin
- Latvian
- Lithuanian
- Luxembourgish
- Macedonian
- Malagasy
- Malay
- Malayalam
- Maltese
- Maori
- Marathi
- Mongolian
- Nepali
- Norwegian
- Oriya
- Oromo
- Pashto
- Persian
- Polish
- Portuguese
- Punjabi
- Quechua
- Romanian
- Russian
- Samoan
- Scots Gaelic
- Serbian
- Shona
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Somali
- Spanish
- Sundanese
- Swahili
- Swedish
- Tajik
- Tamil

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** [Jayveersinh Raj](https://www.linkedin.com/in/jayveersinh-raj-67694222a/), [Khush Patel](https://www.linkedin.com/in/khush-patel-kp/)
- **Model type:** Cross-lingual-zero-shot-transfer
- **Language(s) (NLP):** Pytorch, ONNX
- **License:** apache-2.0

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/Jayveersinh-Raj/cross-lingual-zero-shot-transfer
- **Paper:** Everything is in the above github repository Make sure to give it a star if it is useful.
- **Demo:** [Streamlit](https://jayveersinh-raj-cross-lingual-zero-shot-t-streamlit-app-x6l1as.streamlit.app/)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This model aims to help developers, especially those with little to no experience in NLP, use our model directly to flag or block users from their platforms. Our model aims to work with any arbitrary language, as long as it is supported by the XLM-R vector space aligner embedder model. #Abuse detection #Toxicity analysis #Obscene language detection

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Just use the model from hugging face directly. Following is an example
    
    from transformers import XLMRobertaForSequenceClassification, AutoTokenizer
    import torch
    
    model_name = "Jayveersinh-Raj/PolyGuard"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = XLMRobertaForSequenceClassification.from_pretrained(model_name)

    text = "Jayveer is a great NLP engineer, and a noob in CV"
    inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model(inputs)[0]
    probabilities = torch.softmax(outputs, dim=1)
    predicted_class = torch.argmax(probabilities).item()
    if predicted_class == 1:
      print("Toxic")
    else:
      print("Not toxic")

    


### Downstream Use

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
The model fine tuning is not needed the model already performs well. However can be fine tuned to add languages that are written with different scripts since our model does not perform on language with different script then the source. 


### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
This model does not work with a language written in different script. The transliteration layer has not been added yet.
Moreover, our model flags mostly severe toxicity, since toxicity is a subjective matter. However, in context of flagging or blocking users severty is very important, and our model is very well balanced in that aspect.


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Toxicity is a subjective issue, however the model is very well balanced to flag mostly severe toxicity. The model has never flagged non toxic sentence as toxic. Its performance on non toxicity is 100%, making it a very good choice for the purpose of flagging or blocking users. In addition, if the language is very low resource, and/or distant from English, then the model might misclassify, but the performance is still good.


### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

## How to Get Started with the Model

Use the code below to get started with the model.


## Training Details

### Training Data

<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
The training data involves the data from google jigsaw, and wikipidea. The training language is english, but zero shot mechanism is used to achieve multilinguality using vector space alignment.

### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing

We merged all the sub categories of toxicity to create a super category of toxicity, since all of them are severe, flaggable, and/or blockable.
Class imbalance was present, but state of the art transformer architecture can handle it well.

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->
The model better than GPT4, and a human annotator that annotated the comments of test set as toxic. We arrived at this conclusion because 1. They were manually checked, and 2. On being GPT4 refused to generate toxic sentences, but on being passed the texts from test set where model flagged it non toxic, but were flagged toxic by user, GPT4 translated it, generated it, and said they were toxic, but they were not toxic enough to be blocked or flagged. Hence, our model is near to perfect in this regard. However, limitations, and risks should be taken into account.

### Testing Data, Factors & Metrics
1. Tested on human annotations 
2. Tested on GPT4 generated texts

#### Testing Data

<!-- This should link to a Data Card if possible. -->
The dataset is available on github

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->
Top-1 accuracy, since our data contains multiple langauges.

### Results
1. Tested on human annotations &rarr; 100% on non toxic sentences, better than human, as discussed in evaluation.
2. Tested on GPT4 generated texts &rarr; 100%

#### Summary
Our model is very good for the use case of flagging or blocking users with severe toxic comments, like using swear words or slangs. It is ideal for the purpose because it only flags severe toxicity, and 100% accurate on non-toxic comments. However all of the above should be taken into consideration before using it. It supports all the languages that are supported by XLM-R vector space aligner. The list is as follows: