--- license: wtfpl datasets: - arkodeep/spam-data language: - en tags: - spam - spam classification - text - spam detection - text classification --- # Spam Detection System ## Lite Model ### Introduction The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection. ### Features - **Text Preprocessing**: Lemmatization, removal of stop words and punctuation. - **Feature Extraction**: Text length, word count, unique word count, uppercase count, special character count. - **Model Creation**: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier. - **Visualization**: Generates graphs for dataset insights, word clouds, and performance metrics. - **Metrics Saving**: Accuracy, precision, and F1 score. ### How to Run 1. **Train the Model**: ```bash python training/train_model_lite.py ``` 2. **Use the Model**: ```python import joblib model = joblib.load('models/model.pkl') vectorizer = joblib.load('models/vectorizer.pkl') ``` ## Legacy Model ### Introduction The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection. ### Features - **Text Preprocessing**: Porter Stemming, removal of stop words and punctuation. - **Model Creation**: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters. - **Visualization**: Generates graphs for dataset insights, word clouds, and performance metrics. - **Metrics Saving**: Accuracy and precision. ### How to Run 1. **Train the Model**: ```bash python training/train_model_legacy.py ``` 2. **Use the Model**: ```python import joblib model = joblib.load('models/model.pkl') vectorizer = joblib.load('models/vectorizer.pkl') ``` ### Additional Information - **Dependencies**: Python 3.6 or higher, pip, and required packages listed in `requirements.txt`. - **Dataset**: The dataset used for training is `spam.csv`. - **Contact and Support**: For questions or support, please contact the project maintainers. For more details, you can refer to the [README.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/README.md) and [models.md](https://github.com/arkodeepsen/spam-filter-mbo/blob/4894a939099e5523f22bf3c2e5b3d763c92a73c6/models.md).