This model is a fine-tuned version of the BERT language model, specifically adapted for multi-label classification tasks in the financial regulatory domain. It is built upon the pre-trained ProsusAI/finbert model, which has been further fine-tuned using a diverse dataset of financial regulatory texts. This allows the model to accurately classify text into multiple relevant categories simultaneously.

Model Architecture

  • Base Model: BERT
  • Pre-trained Model: ProsusAI/finbert
  • Task: Multi-label classification

Performance

Performance metrics on the validation set:

  • F1 Score: 0.8637
  • ROC AUC: 0.9044
  • Accuracy: 0.6155

Limitations and Ethical Considerations

  • This model's performance may vary depending on the specific nature of the text data and label distribution.
  • Class imbalance in the dataset.

Dataset Information

  • Training Dataset: Number of samples: 6562
  • Validation Dataset: Number of samples: 929
  • Test Dataset: Number of samples: 1884

Training Details

  • Training Strategy: Fine-tuning BERT with a randomly initialized classification head.
  • Optimizer: Adam
  • Learning Rate: 1e-4
  • Batch Size: 16
  • Number of Epochs: 2
  • Evaluation Strategy: Epoch
  • Weight Decay: 0.01
  • Metric for Best Model: F1 Score
Downloads last month
151
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.