ONNX
Hebrew
bert

Create README.md

#1
by dingerstner - opened
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - he
5
+ ---
6
+ # DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
7
+
8
+ State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).
9
+
10
+ This is the fine-tuned BERT-base model for the named-entity-recognition task.
11
+
12
+ For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
13
+
14
+ Sample usage:
15
+
16
+ ```python
17
+ from transformers import pipeline
18
+ oracle = pipeline('ner', model='dicta-il/dictabert-ner', aggregation_strategy='simple')
19
+ # if we set aggregation_strategy to simple, we need to define a decoder for the tokenizer. Note that the last wordpiece of a group will still be emitted
20
+ from tokenizers.decoders import WordPiece
21
+ oracle.tokenizer.backend_tokenizer.decoder = WordPiece()
22
+ sentence = '''ื“ื•ื“ ื‘ืŸ-ื’ื•ืจื™ื•ืŸ (16 ื‘ืื•ืงื˜ื•ื‘ืจ 1886 - ื•' ื‘ื›ืกืœื• ืชืฉืœ"ื“) ื”ื™ื” ืžื“ื™ื ืื™ ื™ืฉืจืืœื™ ื•ืจืืฉ ื”ืžืžืฉืœื” ื”ืจืืฉื•ืŸ ืฉืœ ืžื“ื™ื ืช ื™ืฉืจืืœ.'''
23
+ oracle(sentence)
24
+ ```
25
+
26
+ Output:
27
+ ```json
28
+ [
29
+ {
30
+ "entity_group": "PER",
31
+ "score": 0.9999443,
32
+ "word": "ื“ื•ื“ ื‘ืŸ - ื’ื•ืจื™ื•ืŸ",
33
+ "start": 0,
34
+ "end": 13
35
+ },
36
+ {
37
+ "entity_group": "TIMEX",
38
+ "score": 0.99987966,
39
+ "word": "16 ื‘ืื•ืงื˜ื•ื‘ืจ 1886",
40
+ "start": 15,
41
+ "end": 31
42
+ },
43
+ {
44
+ "entity_group": "TIMEX",
45
+ "score": 0.9998579,
46
+ "word": "ื•' ื‘ื›ืกืœื• ืชืฉืœ\"ื“",
47
+ "start": 34,
48
+ "end": 48
49
+ },
50
+ {
51
+ "entity_group": "TTL",
52
+ "score": 0.99963045,
53
+ "word": "ื•ืจืืฉ ื”ืžืžืฉืœื”",
54
+ "start": 68,
55
+ "end": 79
56
+ },
57
+ {
58
+ "entity_group": "GPE",
59
+ "score": 0.9997943,
60
+ "word": "ื™ืฉืจืืœ",
61
+ "start": 96,
62
+ "end": 101
63
+ }
64
+ ]
65
+ ```
66
+
67
+ ## Citation
68
+
69
+ If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
70
+
71
+ **BibTeX:**
72
+
73
+ ```bibtex
74
+ @misc{shmidman2023dictabert,
75
+ title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
76
+ author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
77
+ year={2023},
78
+ eprint={2308.16687},
79
+ archivePrefix={arXiv},
80
+ primaryClass={cs.CL}
81
+ }
82
+ ```
83
+
84
+ ## License
85
+
86
+ Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
87
+
88
+ This work is licensed under a
89
+ [Creative Commons Attribution 4.0 International License][cc-by].
90
+
91
+ [![CC BY 4.0][cc-by-image]][cc-by]
92
+
93
+ [cc-by]: http://creativecommons.org/licenses/by/4.0/
94
+ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
95
+ [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
96
+
97
+