Initialize

Browse files

Files changed (9) hide show

README.md +90 -0
config.json +72 -0
merges.txt +0 -0
pytorch_model.bin +3 -0
special_tokens_map.json +15 -0
tf_model.h5 +3 -0
tokenizer.json +0 -0
tokenizer_config.json +14 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,90 @@

+---
+language: fa
+---
+# RobertaNER
+This model fine-tuned for the Named Entity Recognition (NER) task on a mixed NER dataset collected from [ARMAN](https://github.com/HaniehP/PersianNER), [PEYMA](http://nsurl.org/2019-2/tasks/task-7-named-entity-recognition-ner-for-farsi/), and [WikiANN](https://elisa-ie.github.io/wikiann/) that covered ten types of entities:
+- Date (DAT)
+- Event (EVE)
+- Facility (FAC)
+- Location (LOC)
+- Money (MON)
+- Organization (ORG)
+- Percent (PCT)
+- Person (PER)
+- Product (PRO)
+- Time (TIM)
+## Dataset Information
+|       |   Records |   B-DAT |   B-EVE |   B-FAC |   B-LOC |   B-MON |   B-ORG |   B-PCT |   B-PER |   B-PRO |   B-TIM |   I-DAT |   I-EVE |   I-FAC |   I-LOC |   I-MON |   I-ORG |   I-PCT |   I-PER |   I-PRO |   I-TIM |
+|:------|----------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|
+| Train |     29133 |    1423 |    1487 |    1400 |   13919 |     417 |   15926 |     355 |   12347 |    1855 |     150 |    1947 |    5018 |    2421 |    4118 |    1059 |   19579 |     573 |    7699 |    1914 |     332 |
+| Valid |      5142 |     267 |     253 |     250 |    2362 |     100 |    2651 |      64 |    2173 |     317 |      19 |     373 |     799 |     387 |     717 |     270 |    3260 |     101 |    1382 |     303 |      35 |
+| Test  |      6049 |     407 |     256 |     248 |    2886 |      98 |    3216 |      94 |    2646 |     318 |      43 |     568 |     888 |     408 |     858 |     263 |    3967 |     141 |    1707 |     296 |      78 |
+## Evaluation
+The following tables summarize the scores obtained by model overall and per each class.
+**Overall**
+|    Model   | accuracy | precision |  recall  |    f1    |
+|:----------:|:--------:|:---------:|:--------:|:--------:|
+|   Roberta  | 0.994849 |  0.949816 | 0.960235 | 0.954997 |
+**Per entities**
+|     	| number 	| precision 	|  recall  	|    f1    	|
+|:---:	|:------:	|:---------:	|:--------:	|:--------:	|
+| DAT 	|   407  	|  0.844869 	| 0.869779 	| 0.857143 	|
+| EVE 	|   256  	|  0.948148 	| 1.000000 	| 0.973384 	|
+| FAC 	|   248  	|  0.957529 	| 1.000000 	| 0.978304 	|
+| LOC 	|  2884  	|  0.965422 	| 0.968100 	| 0.966759 	|
+| MON 	|   98   	|  0.937500 	| 0.918367 	| 0.927835 	|
+| ORG 	|  3216  	|  0.943662 	| 0.958333 	| 0.950941 	|
+| PCT 	|   94   	|  1.000000 	| 0.968085 	| 0.983784 	|
+| PER 	|  2646  	|  0.957030 	| 0.959562 	| 0.958294 	|
+| PRO 	|   318  	|  0.963636 	| 1.000000 	| 0.981481 	|
+| TIM 	|   43   	|  0.739130 	| 0.790698 	| 0.764045 	|
+## How To Use
+You use this model with Transformers pipeline for NER.
+### Installing requirements
+```bash
+pip install transformers
+```
+### How to predict using pipeline
+```python
+from transformers import AutoTokenizer
+from transformers import AutoModelForTokenClassification  # for pytorch
+from transformers import TFAutoModelForTokenClassification  # for tensorflow
+from transformers import pipeline
+model_name_or_path = "HooshvareLab/roberta-fa-zwnj-base-ner"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
+# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow
+nlp = pipeline("ner", model=model, tokenizer=tokenizer)
+example = "در سال ۲۰۱۳ درگذشت و آندرتیکر و کین برای او مراسم یادبود گرفتند."
+ner_results = nlp(example)
+print(ner_results)
+```
+## Questions?
+Post a Github issue on the [ParsNER Issues](https://github.com/hooshvare/parsner/issues) repo.

config.json ADDED Viewed

	@@ -0,0 +1,72 @@

+{
+  "architectures": [
+    "RobertaForTokenClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "finetuning_task": "ner",
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "O",
+    "1": "B-DAT",
+    "2": "B-EVE",
+    "3": "B-FAC",
+    "4": "B-LOC",
+    "5": "B-MON",
+    "6": "B-ORG",
+    "7": "B-PCT",
+    "8": "B-PER",
+    "9": "B-PRO",
+    "10": "B-TIM",
+    "11": "I-DAT",
+    "12": "I-EVE",
+    "13": "I-FAC",
+    "14": "I-LOC",
+    "15": "I-MON",
+    "16": "I-ORG",
+    "17": "I-PCT",
+    "18": "I-PER",
+    "19": "I-PRO",
+    "20": "I-TIM"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "B-DAT": 1,
+    "B-EVE": 2,
+    "B-FAC": 3,
+    "B-LOC": 4,
+    "B-MON": 5,
+    "B-ORG": 6,
+    "B-PCT": 7,
+    "B-PER": 8,
+    "B-PRO": 9,
+    "B-TIM": 10,
+    "I-DAT": 11,
+    "I-EVE": 12,
+    "I-FAC": 13,
+    "I-LOC": 14,
+    "I-MON": 15,
+    "I-ORG": 16,
+    "I-PCT": 17,
+    "I-PER": 18,
+    "I-PRO": 19,
+    "I-TIM": 20,
+    "O": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.5.0.dev0",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 42000
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:12b5dbde0fd2e2cdabb99cf07b95d241643031398bbd310cf7bdf1adaf86239f
+size 470984439

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+    "bos_token": "<s>",
+    "eos_token": "</s>",
+    "unk_token": "<unk>",
+    "sep_token": "</s>",
+    "pad_token": "<pad>",
+    "cls_token": "<s>",
+    "mask_token": {
+        "content": "<mask>",
+        "single_word": false,
+        "lstrip": true,
+        "rstrip": false,
+        "normalized": false
+    }
+}

tf_model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a36362f6ea7a50a303318bbd8df420227bf2c665e0dca9cc07ab9815ecf77b9e
+size 471165616

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+    "unk_token": "<unk>",
+    "bos_token": "<s>",
+    "eos_token": "</s>",
+    "add_prefix_space": true,
+    "errors": "replace",
+    "sep_token": "</s>",
+    "cls_token": "<s>",
+    "pad_token": "<pad>",
+    "mask_token": "<mask>",
+    "model_max_length": 512,
+    "special_tokens_map_file": null,
+    "name_or_path": "HooshvareLab/roberta-fa-zwnj-base"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff