|
--- |
|
tags: |
|
- spacy |
|
- token-classification |
|
- ner |
|
language: |
|
- en |
|
license: mit |
|
model-index: |
|
- name: en_ner_job_postings |
|
results: |
|
- task: |
|
name: NER |
|
type: token-classification |
|
metrics: |
|
- name: NER Precision |
|
type: precision |
|
value: 0.8516398746 |
|
- name: NER Recall |
|
type: recall |
|
value: 0.8569711538 |
|
- name: NER F Score |
|
type: f_score |
|
value: 0.8542971968 |
|
- task: |
|
name: TAG |
|
type: token-classification |
|
metrics: |
|
- name: TAG (XPOS) Accuracy |
|
type: accuracy |
|
value: 0.9734810915 |
|
- task: |
|
name: UNLABELED_DEPENDENCIES |
|
type: token-classification |
|
metrics: |
|
- name: Unlabeled Attachment Score (UAS) |
|
type: f_score |
|
value: 0.9208198801 |
|
- task: |
|
name: LABELED_DEPENDENCIES |
|
type: token-classification |
|
metrics: |
|
- name: Labeled Attachment Score (LAS) |
|
type: f_score |
|
value: 0.9027174273 |
|
- task: |
|
name: SENTS |
|
type: token-classification |
|
metrics: |
|
- name: Sentences F-Score |
|
type: f_score |
|
value: 0.907098331 |
|
library_name: spacy |
|
pipeline_tag: text-classification |
|
--- |
|
# Custom spaCy NER Model for "Profession," "Facility," and "Experience" Entities |
|
|
|
### Overview |
|
This spaCy-based Named Entity Recognition (NER) model has been custom-trained to recognize and classify entities related to "profession," "facility," and "experience." It is designed to enhance your text analysis capabilities by identifying these specific entity types in unstructured text data. |
|
|
|
### Key Features |
|
Custom-trained for high accuracy in recognizing "profession," "facility," and "experience" entities. |
|
Suitable for various NLP tasks, such as information extraction, content categorization, and more. |
|
Can be easily integrated into your existing spaCy-based NLP pipelines. |
|
|
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `en_ner_job_postings` | |
|
| **Version** | `3.6.0` | |
|
| **spaCy** | `>=3.6.0,<3.7.0` | |
|
| **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` | |
|
| **Components** | `tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` | |
|
| **Vectors** | 514157 keys, 514157 unique vectors (300 dimensions) | |
|
| **License** | `MIT` | |
|
|
|
|
|
|
|
### Usage |
|
Installation |
|
You can install the custom spaCy NER model using pip: |
|
|
|
''' |
|
Copy code |
|
pip install your-ner-model-name |
|
Example Usage |
|
Here's how you can use the model for entity recognition in Python: |
|
''' |
|
|
|
|
|
python |
|
Copy code |
|
import spacy |
|
|
|
# Load the custom spaCy NER model |
|
nlp = spacy.load("your-ner-model-name") |
|
|
|
# Process your text |
|
text = "John Smith is a software engineer at ABC Corp, with over 10 years of experience." |
|
doc = nlp(text) |
|
|
|
# Extract named entities |
|
'for ent in doc.ents: |
|
print(f"Entity: {ent.text}, Type: {ent.label_}") |
|
Entity Types |
|
The model recognizes the following entity types: |
|
|
|
PROFESSION: Represents professions or job titles. |
|
FACILITY: Denotes facilities, buildings, or locations. |
|
EXPERIENCE: Identifies mentions of work experience, durations, or qualifications. |
|
' |
|
### Label Scheme |
|
|
|
<details> |
|
|
|
<summary>View label scheme (116 labels for 3 components)</summary> |
|
|
|
| Component | Labels | |
|
| --- | --- | |
|
| **`tagger`** | `$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, `_SP`, ```` | |
|
| **`parser`** | `ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp` | |
|
| **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `EXPERIENCE`, `FAC`, `FACILITY`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `PROFESSION`, `QUANTITY`, `TIME`, `WORK_OF_ART` | |
|
|
|
</details> |
|
|
|
### Accuracy |
|
|
|
| Type | Score | |
|
| --- | --- | |
|
| `TOKEN_ACC` | 99.86 | |
|
| `TOKEN_P` | 99.57 | |
|
| `TOKEN_R` | 99.58 | |
|
| `TOKEN_F` | 99.57 | |
|
| `TAG_ACC` | 97.35 | |
|
| `SENTS_P` | 92.19 | |
|
| `SENTS_R` | 89.27 | |
|
| `SENTS_F` | 90.71 | |
|
| `DEP_UAS` | 92.08 | |
|
| `DEP_LAS` | 90.27 | |
|
| `ENTS_P` | 85.16 | |
|
| `ENTS_R` | 85.70 | |
|
| `ENTS_F` | 85.43 | |