You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Azerbaijani Named Entity Recognition (NER) Model

This repository contains the code and model for Named Entity Recognition (NER) in Azerbaijani language. The model is built using the XLM-RoBERTa architecture and fine-tuned on a custom dataset.

Model Description

The model recognizes the following entity types:

  • LABEL_0: O: Outside any named entity
  • LABEL_1: PERSON: Names of individuals
  • LABEL_2 :LOCATION: Geographical locations, both man-made and natural
  • LABEL_3 :ORGANISATION: Names of companies, institutions
  • LABEL_4 :DATE: Dates or periods
  • LABEL_5 :TIME: Times of the day
  • LABEL_6 :MONEY: Monetary values
  • LABEL_7 :PERCENTAGE: Percentage values
  • LABEL_8 :FACILITY: Buildings, airports, etc.
  • LABEL_9 :PRODUCT: Products and goods
  • LABEL_10 :EVENT: Events and occurrences
  • LABEL_11 :ART: Artworks, titles of books, songs
  • LABEL_12 :LAW: Legal documents
  • LABEL_13 :LANGUAGE: Languages
  • LABEL_14 :GPE: Countries, cities, states
  • LABEL_15 :NORP: Nationalities or religious or political groups
  • LABEL_16 :ORDINAL: Ordinal numbers
  • LABEL_17 :CARDINAL: Cardinal numbers
  • LABEL_18 :DISEASE: Diseases and medical conditions
  • LABEL_19 :CONTACT: Contact information, e.g., phone numbers, emails
  • LABEL_20 :ADAGE: Proverbs, sayings
  • LABEL_21 :QUANTITY: Measurements and quantities
  • LABEL_22 :MISCELLANEOUS: Miscellaneous entities
  • LABEL_23 :POSITION: Professional or social positions
  • LABEL_24 :PROJECT: Names of projects or programs

Installation

To use the model, you need to install the required libraries. You can do this using pip:

pip install transformers
pip install datasets
from transformers import pipeline, XLMRobertaTokenizerFast, XLMRobertaForTokenClassification

# Load the model and tokenizer
tokenizer = XLMRobertaTokenizerFast.from_pretrained("LocalDoc/ner_azerbaijan")
model = XLMRobertaForTokenClassification.from_pretrained("LocalDoc/ner_azerbaijan")

# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Example text
example = "Komitədən bildirilib ki, sovet dövründə Azərbaycanda cəmi 17 məscid fəaliyyət göstərirdisə, dövlət müstəqilliyinin bərpasından sonra ölkədə 814 məscid tikilib."

# Perform NER
ner_results = nlp(example)

# Mapping of label indices to their descriptions
label_mapping = {
    0: "O",
    1: "PERSON",
    2: "LOCATION",
    3: "ORGANISATION",
    4: "DATE",
    5: "TIME",
    6: "MONEY",
    7: "PERCENTAGE",
    8: "FACILITY",
    9: "PRODUCT",
    10: "EVENT",
    11: "ART",
    12: "LAW",
    13: "LANGUAGE",
    14: "GPE",
    15: "NORP",
    16: "ORDINAL",
    17: "CARDINAL",
    18: "DISEASE",
    19: "CONTACT",
    20: "ADAGE",
    21: "QUANTITY",
    22: "MISCELLANEOUS",
    23: "POSITION",
    24: "PROJECT"
}

# Print results with mapped entity types
for result in ner_results:
    entity_group = result['entity_group']
    entity_description = label_mapping[int(entity_group.split('_')[-1])]
    print({
        'entity_group': entity_description,
        'score': result['score'],
        'word': result['word'],
        'start': result['start'],
        'end': result['end']
    })

License

This model licensed under the CC BY-NC-ND 4.0 license. What does this license allow?

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
Non-Commercial: You may not use the material for commercial purposes.
No Derivatives: If you remix, transform, or build upon the material, you may not distribute the modified material.

For more information, please refer to the CC BY-NC-ND 4.0 license.

Contact

For more information, questions, or issues, please contact LocalDoc at [[email protected]].

Downloads last month
9
Safetensors
Model size
277M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train LocalDoc/ner_azerbaijan