megantosh commited on
Commit
0d5e6ab
1 Parent(s): a1d98ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -16,11 +16,16 @@ widget:
16
  # Arabic NER Model using Flair Embeddings
17
  Training was conducted over 94 epochs, using a linear decaying learning rate of 2e-05, starting from 0.225 and a batch size of 32 with GloVe and Flair forward and backward embeddings.
18
 
19
- Results:
 
 
 
 
 
20
  - F1-score (micro) 0.8666
21
  - F1-score (macro) 0.8488
22
 
23
- | | tp | fp | fn | precision | recall | class-F1 |
24
  |------|-----|----|----|-----------|--------|----------|
25
  | LOC | 539 | 51 | 68 | 0.9136 | 0.8880 | 0.9006 |
26
  | MISC | 408 | 57 | 89 | 0.8774 | 0.8209 | 0.8482 |
@@ -29,6 +34,71 @@ Results:
29
 
30
  ---
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ```
33
  2020-10-27 12:05:47,801 Model: "SequenceTagger(
34
  (embeddings): StackedEmbeddings(
@@ -59,4 +129,15 @@ Results:
59
  (weights): None
60
  (weight_tensor) None
61
 
62
- ```
 
 
 
 
 
 
 
 
 
 
 
 
16
  # Arabic NER Model using Flair Embeddings
17
  Training was conducted over 94 epochs, using a linear decaying learning rate of 2e-05, starting from 0.225 and a batch size of 32 with GloVe and Flair forward and backward embeddings.
18
 
19
+
20
+ ## Original Datasets:
21
+ - [AQMAR](http://www.cs.cmu.edu/~ark/ArabicNER/)
22
+ - [ANERcorp](http://curtis.ml.cmu.edu/w/courses/index.php/ANERcorp)
23
+
24
+ ## Results:
25
  - F1-score (micro) 0.8666
26
  - F1-score (macro) 0.8488
27
 
28
+ | | True Posititves | False Positives | False Negatives | Precision | Recall | class-F1 |
29
  |------|-----|----|----|-----------|--------|----------|
30
  | LOC | 539 | 51 | 68 | 0.9136 | 0.8880 | 0.9006 |
31
  | MISC | 408 | 57 | 89 | 0.8774 | 0.8209 | 0.8482 |
 
34
 
35
  ---
36
 
37
+ # Usage
38
+ ```python
39
+ from flair.data import Sentence
40
+ from flair.models import SequenceTagger
41
+ import pyarabic.araby as araby
42
+ from flair.tokenization import JapaneseTokenizer
43
+ from icecream import ic
44
+
45
+ tagger = SequenceTagger.load("julien-c/flair-ner")
46
+ arTagger = SequenceTagger.load('megantosh/flair-arabic-multi-ner')
47
+
48
+ sentence = Sentence('George Washington went to Washington .')
49
+ arSentence = Sentence('عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .')
50
+
51
+
52
+ # predict NER tags
53
+ tagger.predict(sentence)
54
+ arTagger.predict(arSentence)
55
+
56
+ # print sentence with predicted tags
57
+ ic(sentence.to_tagged_string)
58
+ ic(arSentence.to_tagged_string)
59
+
60
+ ```
61
+
62
+ # Example
63
+ ```bash
64
+ 2021-07-07 14:30:59,649 loading file /Users/mega/.flair/models/flair-ner/f22eb997f66ae2eacad974121069abaefca5fe85fce71b49e527420ff45b9283.941c7c30b38aef8d8a4eb5c1b6dd7fe8583ff723fef457382589ad6a4e859cfc
65
+ 2021-07-07 14:31:04,654 loading file /Users/mega/.flair/models/flair-arabic-multi-ner/c7af7ddef4fdcc681fcbe1f37719348afd2862b12aa1cfd4f3b93bd2d77282c7.242d030cb106124f7f9f6a88fb9af8e390f581d42eeca013367a86d585ee6dd6
66
+ ic| sentence.to_tagged_string: <bound method Sentence.to_tagged_string of Sentence: "George Washington went to Washington ." [− Tokens: 6 − Token-Labels: "George <B-PER> Washington <E-PER> went to Washington <S-LOC> ."]>
67
+ ic| arSentence.to_tagged_string: <bound method Sentence.to_tagged_string of Sentence: "عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة ." [− Tokens: 11 − Token-Labels: "عمرو <B-PER> عادلي <I-PER> أستاذ للاقتصاد السياسي المساعد في الجامعة <B-ORG> الأمريكية <I-ORG> بالقاهرة <B-LOC> ."]>
68
+ ic| entity: <PER-span (1,2): "George Washington">
69
+ ic| entity: <LOC-span (5): "Washington">
70
+ ic| entity: <PER-span (1,2): "عمرو عادلي">
71
+ ic| entity: <ORG-span (8,9): "الجامعة الأمريكية">
72
+ ic| entity: <LOC-span (10): "بالقاهرة">
73
+ ic| sentence.to_dict(tag_type='ner'):
74
+ {"text":"عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .",
75
+ "labels":[],
76
+ {"entities":[{{{
77
+ "text":"عمرو عادلي",
78
+ "start_pos":0,
79
+ "end_pos":10,
80
+ "labels":[PER (0.9826)]},
81
+ {"text":"الجامعة الأمريكية",
82
+ "start_pos":45,
83
+ "end_pos":62,
84
+ "labels":[ORG (0.7679)]},
85
+ {"text":"بالقاهرة",
86
+ "start_pos":64,
87
+ "end_pos":72,
88
+ "labels":[LOC (0.8079)]}]}
89
+ "text":"George Washington went to Washington .",
90
+ "labels":[],
91
+ "entities":[{
92
+ {"text":"George Washington",
93
+ "start_pos":0,
94
+ "end_pos":17,
95
+ "labels":[PER (0.9968)]},
96
+ {"text":"Washington""start_pos":26,
97
+ "end_pos":36,
98
+ "labels":[LOC (0.9994)]}}]}
99
+ ```
100
+
101
+ # Configuration
102
  ```
103
  2020-10-27 12:05:47,801 Model: "SequenceTagger(
104
  (embeddings): StackedEmbeddings(
 
129
  (weights): None
130
  (weight_tensor) None
131
 
132
+ ```
133
+
134
+
135
+ # Citation
136
+ *if you use this model in your work, please consider citing this work:*
137
+ ```latex
138
+ @unpublished{MMHU21
139
+ author = "M. Megahed and A. Akbik",
140
+ title = "Sequence Labeling Architectures in Diglossia",
141
+ note = "In preparation",
142
+ }
143
+ ```