Model Card for Model ID

This is a fine-tuned BERT-based model for Turkish intent classification, capable of categorizing intents into 82 distinct labels. It was trained on a consolidated dataset of multilingual intent datasets, translated and normalized to Turkish.

How to Get Started with the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model = AutoModelForSequenceClassification.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")
tokenizer = AutoTokenizer.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Şarkıyı çal, Sam."
prediction = pipe(text)

print(prediction)
# [{'label': 'play_music', 'score': 0.999117910861969}]

Uses

This model is intended for:

Natural Language Understanding (NLU) tasks involving Turkish text. Classifying user intents in Turkish for applications such as:

Voice assistants
Chatbots
Customer support automation
Conversational AI systems

Bias, Risks, and Limitations

The model's performance may degrade on intents that are underrepresented in the training data. Not optimized for languages other than Turkish. Domain-specific intents not included in the dataset may require additional fine-tuning.

Training Details

Training Data

This model was trained on a combination of intent datasets from various sources, normalized to Turkish:

Datasets Used:

mteb/amazon_massive_intent
mteb/mtop_intent
sonos-nlu-benchmark/snips_built_in_intents
Mozilla/smart_intent_dataset
Bhuvaneshwari/intent_classification
clinc/clinc_oos

Each dataset was preprocessed, translated to Turkish where necessary, and intent labels were consolidated into 82 unique classes.

Dataset Sizes:

Training: 150,235
Validation: 18,780
Test: 18,779

Training Procedure

The model was fine-tuned with the following hyperparameters:

Base Model: bert-base-uncased Learning Rate: 3e-5 Batch Size: 32 Epochs: 5 Weight Decay: 0.01 Evaluation Strategy: Per epoch Mixed Precision: FP32 Hardware: A100

Evaluation

Results

Training and Validation:

Epoch	Training Loss	Validation Loss	Accuracy	F1 Score	Precision	Recall
1	0.3485	0.3438	91.16%	90.56%	90.89%	91.16%
2	0.2262	0.2418	93.73%	93.61%	93.67%	93.73%
3	0.1407	0.2389	94.33%	94.20%	94.23%	94.33%
4	0.1002	0.2390	94.68%	94.59%	94.60%	94.68%
5	0.0588	0.2481	94.87%	94.81%	94.83%	94.87%

Test Results:

Metric	Value
Loss	0.2457
Accuracy	94.79%
F1 Score	94.79%
Precision	94.85%
Recall	94.79%

yeniguno
/

bert-uncased-turkish-intent-classification