arkodeep's picture
Update README.md
94ce757 verified
metadata
license: wtfpl
datasets:
  - arkodeep/spam-data
language:
  - en
tags:
  - spam
  - spam classification
  - text
  - spam detection
  - text classification

Spam Detection System

Lite Model

Introduction

The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection.

Features

  • Text Preprocessing: Lemmatization, removal of stop words and punctuation.
  • Feature Extraction: Text length, word count, unique word count, uppercase count, special character count.
  • Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier.
  • Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
  • Metrics Saving: Accuracy, precision, and F1 score.

How to Run

  1. Train the Model:
    python training/train_model_lite.py
    
  2. Use the Model:
    import joblib
    model = joblib.load('models/model.pkl')
    vectorizer = joblib.load('models/vectorizer.pkl')
    

Legacy Model

Introduction

The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection.

Features

  • Text Preprocessing: Porter Stemming, removal of stop words and punctuation.
  • Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters.
  • Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
  • Metrics Saving: Accuracy and precision.

How to Run

  1. Train the Model:
    python training/train_model_legacy.py
    
  2. Use the Model:
    import joblib
    model = joblib.load('models/model.pkl')
    vectorizer = joblib.load('models/vectorizer.pkl')
    

Additional Information

  • Dependencies: Python 3.6 or higher, pip, and required packages listed in requirements.txt.
  • Dataset: The dataset used for training is spam.csv.
  • Contact and Support: For questions or support, please contact the project maintainers.

For more details, you can refer to the README.md and models.md.