Model Card for Model ID
This is a fine-tuned BERT-based model for Turkish intent classification, capable of categorizing intents into 82 distinct labels. It was trained on a consolidated dataset of multilingual intent datasets, translated and normalized to Turkish.
How to Get Started with the Model
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
model = AutoModelForSequenceClassification.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")
tokenizer = AutoTokenizer.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Şarkıyı çal, Sam."
prediction = pipe(text)
print(prediction)
# [{'label': 'play_music', 'score': 0.999117910861969}]
Uses
This model is intended for:
Natural Language Understanding (NLU) tasks involving Turkish text. Classifying user intents in Turkish for applications such as:
- Voice assistants
- Chatbots
- Customer support automation
- Conversational AI systems
Bias, Risks, and Limitations
The model's performance may degrade on intents that are underrepresented in the training data. Not optimized for languages other than Turkish. Domain-specific intents not included in the dataset may require additional fine-tuning.
Training Details
Training Data
This model was trained on a combination of intent datasets from various sources, normalized to Turkish:
Datasets Used:
- mteb/amazon_massive_intent
- mteb/mtop_intent
- sonos-nlu-benchmark/snips_built_in_intents
- Mozilla/smart_intent_dataset
- Bhuvaneshwari/intent_classification
- clinc/clinc_oos
Each dataset was preprocessed, translated to Turkish where necessary, and intent labels were consolidated into 82 unique classes.
Dataset Sizes:
- Training: 150,235
- Validation: 18,780
- Test: 18,779
Training Procedure
The model was fine-tuned with the following hyperparameters:
Base Model: bert-base-uncased Learning Rate: 3e-5 Batch Size: 32 Epochs: 5 Weight Decay: 0.01 Evaluation Strategy: Per epoch Mixed Precision: FP32 Hardware: A100
Evaluation
Results
Training and Validation:
Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | Precision | Recall |
---|---|---|---|---|---|---|
1 | 0.3485 | 0.3438 | 91.16% | 90.56% | 90.89% | 91.16% |
2 | 0.2262 | 0.2418 | 93.73% | 93.61% | 93.67% | 93.73% |
3 | 0.1407 | 0.2389 | 94.33% | 94.20% | 94.23% | 94.33% |
4 | 0.1002 | 0.2390 | 94.68% | 94.59% | 94.60% | 94.68% |
5 | 0.0588 | 0.2481 | 94.87% | 94.81% | 94.83% | 94.87% |
Test Results:
Metric | Value |
---|---|
Loss | 0.2457 |
Accuracy | 94.79% |
F1 Score | 94.79% |
Precision | 94.85% |
Recall | 94.79% |
- Downloads last month
- 20
Model tree for yeniguno/bert-uncased-turkish-intent-classification
Base model
google-bert/bert-base-uncased