Model Card for Model ID

This is a fine-tuned BERT-based model for Turkish intent classification, capable of categorizing intents into 82 distinct labels. It was trained on a consolidated dataset of multilingual intent datasets, translated and normalized to Turkish.

How to Get Started with the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model = AutoModelForSequenceClassification.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")
tokenizer = AutoTokenizer.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Şarkıyı çal, Sam."
prediction = pipe(text)

print(prediction)
# [{'label': 'play_music', 'score': 0.999117910861969}]

Uses

This model is intended for:

Natural Language Understanding (NLU) tasks involving Turkish text. Classifying user intents in Turkish for applications such as:

  • Voice assistants
  • Chatbots
  • Customer support automation
  • Conversational AI systems

Bias, Risks, and Limitations

The model's performance may degrade on intents that are underrepresented in the training data. Not optimized for languages other than Turkish. Domain-specific intents not included in the dataset may require additional fine-tuning.

Training Details

Training Data

This model was trained on a combination of intent datasets from various sources, normalized to Turkish:

Datasets Used:

  • mteb/amazon_massive_intent
  • mteb/mtop_intent
  • sonos-nlu-benchmark/snips_built_in_intents
  • Mozilla/smart_intent_dataset
  • Bhuvaneshwari/intent_classification
  • clinc/clinc_oos

Each dataset was preprocessed, translated to Turkish where necessary, and intent labels were consolidated into 82 unique classes.

Dataset Sizes:

  • Training: 150,235
  • Validation: 18,780
  • Test: 18,779

Training Procedure

The model was fine-tuned with the following hyperparameters:

Base Model: bert-base-uncased Learning Rate: 3e-5 Batch Size: 32 Epochs: 5 Weight Decay: 0.01 Evaluation Strategy: Per epoch Mixed Precision: FP32 Hardware: A100

Evaluation

Results

Training and Validation:

Epoch Training Loss Validation Loss Accuracy F1 Score Precision Recall
1 0.3485 0.3438 91.16% 90.56% 90.89% 91.16%
2 0.2262 0.2418 93.73% 93.61% 93.67% 93.73%
3 0.1407 0.2389 94.33% 94.20% 94.23% 94.33%
4 0.1002 0.2390 94.68% 94.59% 94.60% 94.68%
5 0.0588 0.2481 94.87% 94.81% 94.83% 94.87%

Test Results:

Metric Value
Loss 0.2457
Accuracy 94.79%
F1 Score 94.79%
Precision 94.85%
Recall 94.79%
Downloads last month
20
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yeniguno/bert-uncased-turkish-intent-classification

Finetuned
(2422)
this model