This model is a fine-tuned version of the BERT language model, specifically adapted for multi-label classification tasks in the financial regulatory domain. It is built upon the pre-trained ProsusAI/finbert model, which has been further fine-tuned using a diverse dataset of financial regulatory texts. This allows the model to accurately classify text into multiple relevant categories simultaneously.

Model Architecture

Base Model: BERT
Pre-trained Model: ProsusAI/finbert
Task: Multi-label classification

Performance

Performance metrics on the validation set:

F1 Score: 0.8637
ROC AUC: 0.9044
Accuracy: 0.6155

Limitations and Ethical Considerations

This model's performance may vary depending on the specific nature of the text data and label distribution.
Class imbalance in the dataset.

Dataset Information

Training Dataset: Number of samples: 6562
Validation Dataset: Number of samples: 929
Test Dataset: Number of samples: 1884

Training Details

Training Strategy: Fine-tuning BERT with a randomly initialized classification head.
Optimizer: Adam
Learning Rate: 1e-4
Batch Size: 16
Number of Epochs: 2
Evaluation Strategy: Epoch
Weight Decay: 0.01
Metric for Best Model: F1 Score