saribasmetehan's picture
Update README.md
1205fa4 verified
metadata
license: mit
base_model: dbmdz/bert-base-turkish-uncased
tags:
  - generated_from_trainer
datasets:
  - turkish-wiki_ner
metrics:
  - f1
model-index:
  - name: bert-base-turkish-uncased-ner
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: turkish-wiki_ner
          type: turkish-wiki_ner
          config: turkish-WikiNER
          split: validation
          args: turkish-WikiNER
        metrics:
          - name: F1
            type: f1
            value: 0.7821495486288537
language:
  - tr
widget:
  - text: Leblebi Mehmet adıyla Galatasarayın sembol futbolcularından oldu.

bert-base-turkish-uncased-ner

This model is a fine-tuned version of dbmdz/bert-base-turkish-uncased on the turkish-wiki_ner dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2603
  • F1: 0.7821

Model description

This model is a fine-tuned version of dbmdz/bert-base-turkish-uncased on the turkish-wiki_ner dataset. The training dataset consists of 18,967 samples, and the validation dataset consists of 1,000 samples, both derived from Wikipedia data.

For more detailed information, please visit this link: https://huggingface.co/datasets/turkish-nlp-suite/turkish-wikiNER

Labels:

  • CARDINAL
  • DATE
  • EVENT
  • FAC
  • GPE
  • LANGUAGE
  • LAW
  • LOC
  • MONEY
  • NORP
  • ORDINAL
  • ORG
  • PERCENT
  • PERSON
  • PRODUCT
  • QUANTITY
  • TIME
  • TITLE
  • WORK_OF_ART

Fine-Tuning Process : https://github.com/saribasmetehan/bert-base-turkish-uncased-ner

Example

from transformers import pipeline
import pandas as pd

text = "Bu toplam sıfır ise, Newton'ın birinci yasası cismin hareket durumunun değişmeyeceğini söyler."
model_id = "saribasmetehan/bert-base-turkish-uncased-ner"
ner = pipeline("ner",model = model_id)
preds= ner(text, aggregation_strategy = "simple")

pd.DataFrame(preds)

Load model directly

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_name = "saribasmetehan/bert-base-turkish-uncased-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss F1
0.4 1.0 1186 0.2502 0.7703
0.2227 2.0 2372 0.2439 0.7740
0.1738 3.0 3558 0.2511 0.7783
0.1474 4.0 4744 0.2603 0.7821

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1