arkodeep
/

spam-classfication-model

spam classification

text classification

Model card Files Files and versions Community

spam-classfication-model / README.md

arkodeep's picture

Update README.md

94ce757 verified 2 months ago

|

history blame contribute delete

2.41 kB

metadata

license: wtfpl
datasets:
  - arkodeep/spam-data
language:
  - en
tags:
  - spam
  - spam classification
  - text
  - spam detection
  - text classification

Spam Detection System

Lite Model

Introduction

The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection.

Features

Text Preprocessing: Lemmatization, removal of stop words and punctuation.
Feature Extraction: Text length, word count, unique word count, uppercase count, special character count.
Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier.
Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
Metrics Saving: Accuracy, precision, and F1 score.

How to Run

Train the Model:
```
python training/train_model_lite.py
```

Use the Model:

import joblib
model = joblib.load('models/model.pkl')
vectorizer = joblib.load('models/vectorizer.pkl')

Legacy Model

Introduction

The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection.

Features

Text Preprocessing: Porter Stemming, removal of stop words and punctuation.
Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters.
Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
Metrics Saving: Accuracy and precision.

How to Run

Train the Model:
```
python training/train_model_legacy.py
```

Use the Model:

import joblib
model = joblib.load('models/model.pkl')
vectorizer = joblib.load('models/vectorizer.pkl')

Additional Information

Dependencies: Python 3.6 or higher, pip, and required packages listed in requirements.txt.
Dataset: The dataset used for training is spam.csv.
Contact and Support: For questions or support, please contact the project maintainers.

For more details, you can refer to the README.md and models.md.