HamSpamBERT / README.md
udit-k's picture
Update README.md
1783eb7 verified
metadata
license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: HamSpamBERT
    results: []
widget:
  - text: Ok i am on the way to home bye
    example_title: Ham
  - text: >-
      PRIVATE! Your 2004 Account Statement for 07742676969 shows 786 unredeemed
      Bonus Points. To claim call 08719180248 Identifier Code: 45239 Expires
    example_title: Spam

HamSpamBERT

This model is a fine-tuned version of bert-base-uncased on Spam-Ham dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0072
  • Accuracy: 0.9991
  • Precision: 1.0
  • Recall: 0.9933
  • F1: 0.9966
from transformers import pipeline, BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained("udit-k/HamSpamBERT")
model = BertForSequenceClassification.from_pretrained("udit-k/HamSpamBERT")

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
print(classifier("Call this number to win FREE IPL FINAL tickets!!!"))
print(classifier("Call me when you reach home :)"))
[{'label': 'LABEL_1', 'score': 0.9999189376831055}]
[{'label': 'LABEL_0', 'score': 0.9999370574951172}]

Model description

This model is a fine-tuned version of the BERT model on Spam-Ham dataset to improve the performance of sentiment analysis on Spam Detection tasks.

  • LABEL_0 = Ham (Not spam)
  • LABEL_1 = Spam

Intended uses & limitations

This model can be used to detect spam texts. The primary limitation of this model is that it was trained on a corpus of about 4700 rows and evaluated on around 1200 rows.

Training and evaluation data

  • Training corpus = 80%
  • Evaluation corpus = 20%

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 7

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
No log 1.0 279 0.0492 0.9901 1.0 0.9262 0.9617
0.0635 2.0 558 0.0117 0.9982 1.0 0.9866 0.9932
0.0635 3.0 837 0.0120 0.9982 0.9933 0.9933 0.9933
0.0138 4.0 1116 0.0072 0.9991 1.0 0.9933 0.9966
0.0138 5.0 1395 0.0086 0.9982 0.9933 0.9933 0.9933
0.0007 6.0 1674 0.0090 0.9982 0.9933 0.9933 0.9933
0.0007 7.0 1953 0.0091 0.9982 0.9933 0.9933 0.9933

Framework versions

  • Transformers 4.30.0
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.13.3