Update README.md
Browse files
README.md
CHANGED
@@ -16,11 +16,16 @@ widget:
|
|
16 |
# Arabic NER Model using Flair Embeddings
|
17 |
Training was conducted over 94 epochs, using a linear decaying learning rate of 2e-05, starting from 0.225 and a batch size of 32 with GloVe and Flair forward and backward embeddings.
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
20 |
- F1-score (micro) 0.8666
|
21 |
- F1-score (macro) 0.8488
|
22 |
|
23 |
-
| |
|
24 |
|------|-----|----|----|-----------|--------|----------|
|
25 |
| LOC | 539 | 51 | 68 | 0.9136 | 0.8880 | 0.9006 |
|
26 |
| MISC | 408 | 57 | 89 | 0.8774 | 0.8209 | 0.8482 |
|
@@ -29,6 +34,71 @@ Results:
|
|
29 |
|
30 |
---
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
```
|
33 |
2020-10-27 12:05:47,801 Model: "SequenceTagger(
|
34 |
(embeddings): StackedEmbeddings(
|
@@ -59,4 +129,15 @@ Results:
|
|
59 |
(weights): None
|
60 |
(weight_tensor) None
|
61 |
|
62 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
# Arabic NER Model using Flair Embeddings
|
17 |
Training was conducted over 94 epochs, using a linear decaying learning rate of 2e-05, starting from 0.225 and a batch size of 32 with GloVe and Flair forward and backward embeddings.
|
18 |
|
19 |
+
|
20 |
+
## Original Datasets:
|
21 |
+
- [AQMAR](http://www.cs.cmu.edu/~ark/ArabicNER/)
|
22 |
+
- [ANERcorp](http://curtis.ml.cmu.edu/w/courses/index.php/ANERcorp)
|
23 |
+
|
24 |
+
## Results:
|
25 |
- F1-score (micro) 0.8666
|
26 |
- F1-score (macro) 0.8488
|
27 |
|
28 |
+
| | True Posititves | False Positives | False Negatives | Precision | Recall | class-F1 |
|
29 |
|------|-----|----|----|-----------|--------|----------|
|
30 |
| LOC | 539 | 51 | 68 | 0.9136 | 0.8880 | 0.9006 |
|
31 |
| MISC | 408 | 57 | 89 | 0.8774 | 0.8209 | 0.8482 |
|
|
|
34 |
|
35 |
---
|
36 |
|
37 |
+
# Usage
|
38 |
+
```python
|
39 |
+
from flair.data import Sentence
|
40 |
+
from flair.models import SequenceTagger
|
41 |
+
import pyarabic.araby as araby
|
42 |
+
from flair.tokenization import JapaneseTokenizer
|
43 |
+
from icecream import ic
|
44 |
+
|
45 |
+
tagger = SequenceTagger.load("julien-c/flair-ner")
|
46 |
+
arTagger = SequenceTagger.load('megantosh/flair-arabic-multi-ner')
|
47 |
+
|
48 |
+
sentence = Sentence('George Washington went to Washington .')
|
49 |
+
arSentence = Sentence('عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .')
|
50 |
+
|
51 |
+
|
52 |
+
# predict NER tags
|
53 |
+
tagger.predict(sentence)
|
54 |
+
arTagger.predict(arSentence)
|
55 |
+
|
56 |
+
# print sentence with predicted tags
|
57 |
+
ic(sentence.to_tagged_string)
|
58 |
+
ic(arSentence.to_tagged_string)
|
59 |
+
|
60 |
+
```
|
61 |
+
|
62 |
+
# Example
|
63 |
+
```bash
|
64 |
+
2021-07-07 14:30:59,649 loading file /Users/mega/.flair/models/flair-ner/f22eb997f66ae2eacad974121069abaefca5fe85fce71b49e527420ff45b9283.941c7c30b38aef8d8a4eb5c1b6dd7fe8583ff723fef457382589ad6a4e859cfc
|
65 |
+
2021-07-07 14:31:04,654 loading file /Users/mega/.flair/models/flair-arabic-multi-ner/c7af7ddef4fdcc681fcbe1f37719348afd2862b12aa1cfd4f3b93bd2d77282c7.242d030cb106124f7f9f6a88fb9af8e390f581d42eeca013367a86d585ee6dd6
|
66 |
+
ic| sentence.to_tagged_string: <bound method Sentence.to_tagged_string of Sentence: "George Washington went to Washington ." [− Tokens: 6 − Token-Labels: "George <B-PER> Washington <E-PER> went to Washington <S-LOC> ."]>
|
67 |
+
ic| arSentence.to_tagged_string: <bound method Sentence.to_tagged_string of Sentence: "عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة ." [− Tokens: 11 − Token-Labels: "عمرو <B-PER> عادلي <I-PER> أستاذ للاقتصاد السياسي المساعد في الجامعة <B-ORG> الأمريكية <I-ORG> بالقاهرة <B-LOC> ."]>
|
68 |
+
ic| entity: <PER-span (1,2): "George Washington">
|
69 |
+
ic| entity: <LOC-span (5): "Washington">
|
70 |
+
ic| entity: <PER-span (1,2): "عمرو عادلي">
|
71 |
+
ic| entity: <ORG-span (8,9): "الجامعة الأمريكية">
|
72 |
+
ic| entity: <LOC-span (10): "بالقاهرة">
|
73 |
+
ic| sentence.to_dict(tag_type='ner'):
|
74 |
+
{"text":"عمرو عادلي أستاذ للاقتصاد السياسي المساعد في الجامعة الأمريكية بالقاهرة .",
|
75 |
+
"labels":[],
|
76 |
+
{"entities":[{{{
|
77 |
+
"text":"عمرو عادلي",
|
78 |
+
"start_pos":0,
|
79 |
+
"end_pos":10,
|
80 |
+
"labels":[PER (0.9826)]},
|
81 |
+
{"text":"الجامعة الأمريكية",
|
82 |
+
"start_pos":45,
|
83 |
+
"end_pos":62,
|
84 |
+
"labels":[ORG (0.7679)]},
|
85 |
+
{"text":"بالقاهرة",
|
86 |
+
"start_pos":64,
|
87 |
+
"end_pos":72,
|
88 |
+
"labels":[LOC (0.8079)]}]}
|
89 |
+
"text":"George Washington went to Washington .",
|
90 |
+
"labels":[],
|
91 |
+
"entities":[{
|
92 |
+
{"text":"George Washington",
|
93 |
+
"start_pos":0,
|
94 |
+
"end_pos":17,
|
95 |
+
"labels":[PER (0.9968)]},
|
96 |
+
{"text":"Washington""start_pos":26,
|
97 |
+
"end_pos":36,
|
98 |
+
"labels":[LOC (0.9994)]}}]}
|
99 |
+
```
|
100 |
+
|
101 |
+
# Configuration
|
102 |
```
|
103 |
2020-10-27 12:05:47,801 Model: "SequenceTagger(
|
104 |
(embeddings): StackedEmbeddings(
|
|
|
129 |
(weights): None
|
130 |
(weight_tensor) None
|
131 |
|
132 |
+
```
|
133 |
+
|
134 |
+
|
135 |
+
# Citation
|
136 |
+
*if you use this model in your work, please consider citing this work:*
|
137 |
+
```latex
|
138 |
+
@unpublished{MMHU21
|
139 |
+
author = "M. Megahed and A. Akbik",
|
140 |
+
title = "Sequence Labeling Architectures in Diglossia",
|
141 |
+
note = "In preparation",
|
142 |
+
}
|
143 |
+
```
|