morit commited on
Commit
7500b48
1 Parent(s): 94da3df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -1,3 +1,62 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - xnli
5
+ - multi_nli
6
+ language:
7
+ - ar
8
+ - bg
9
+ - de
10
+ - el
11
+ - en
12
+ - es
13
+ - tr
14
+ - th
15
+ - ur
16
+ - hi
17
+ - zh
18
+ - vi
19
+ - fr
20
+ - ru
21
+ - sw
22
+ metrics:
23
+ - accuracy
24
  ---
25
+
26
+ # XLM-T-ROBERTA-BASE-MNLI-XNLI
27
+
28
+ ## Model description
29
+ This model takes the XLM-Roberta-base model which has been continued to pre-traine on a large corpus of Twitter in multiple languages.
30
+ It was developed following a similar strategy as introduced as part of the [Tweet Eval](https://github.com/cardiffnlp/tweeteval) framework.
31
+ The model is further finetuned on the MNLI dataset and also on the xnli dataset.
32
+
33
+ ## Intended Usage
34
+
35
+ This model was developed to do Zero-Shot Text Classification in the realm of Hate Speech Detection. It is finetuned on the whole xnli train set containing 15 different languages like:
36
+ **ar, bg ,de , en, el , es, fr, hi, ru, sw, th, tr, ur, vi, zh**
37
+ Since the base model was pre-trained on 100 different languages it has shown some effectiveness in other languages. Please refer to the list of languages in the [XLM Roberta paper](https://arxiv.org/abs/1911.02116)
38
+
39
+ ### Usage with Zero-Shot Classification pipeline
40
+ ```python
41
+ from transformers import pipeline
42
+ classifier = pipeline("zero-shot-classification",
43
+ model="morit/xlm-t-roberta-base-mnli-xnli")
44
+ ```
45
+
46
+
47
+ ## Training
48
+
49
+ This model was pre-trained on set of 100 languages, as described in the original paper. It was then fine-tuned on the task of NLI on the concatenated MNLI train set. Finally, it was trained for one additional epoch on only XNLI data where the translations for the premise and hypothesis are shuffled such that the premise and hypothesis for each example come from the same original English example but the premise and hypothesis are of different languages.
50
+ The following hyper-parameters were chosen:
51
+ - learning rate: 2e-5
52
+ - batch size: 32
53
+ - max sequence: length 128
54
+
55
+ using one GPU (NVIDIA GeForce RTX 3090)
56
+
57
+ ## Evaluation
58
+ The model was evaluated on all the test sets of the xnli dataset resulting in the following accuracies:
59
+
60
+ | ar | bg | de | en | el | es | fr | hi| ru | sw | th | tr |ur | vi | zh |
61
+ |-----|-----|-----|----|----|----|----|----|----|----|----|----|----|----|----|
62
+ |0.776|0.804|0.796|0.791|0.851|0.813|0.806|0.757|0.783|0.716|0.765|0.780|0.705|0.795|0.782|