Spam or Ham

About

Spam, also known as junk email, is unwanted or unsolicited messages forwarded in bulk to users’ accounts. Such emails can clog up inboxes, take up unnecessary disk space, and in general cause a negative user experience for its recipients. Most major email service providers implement some form of a spam filter to automatically forward spam emails to a junk inbox, preventing such emails from impacting their users. Using an appropriate spam classification dataset from Kaggle, a data visualization and machine learning solution was developed. The end product is a web application built in pure Python, using the Dash framework, that allows the user to generate meaningful visualizations of email records, and predict whether a given email is spam. Using a bi-directional LSTM network, the spam classifier was able to achieve 99% accuracy on test data.

Usage

Running in a Development Environment

Install requirements first: pip install -r requirements.txt

To run the Dash server:

$ python3 runserver.py
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)

After running runserver.py, navigate to localhost to view the application.

Debug Mode

By default, debug mode is turned off. To run the Dash server with debug mode turned on, use the --d or --debug option.

$ python3 runserver.py --d
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on

Running on debug mode will turn on Dash DevTools, giving you access to tools like callback graphs, hot reloading, in-app error reporting, etc.

Use this for debugging and development purposes.

Deploying to a Production Environment

In a development environment, the Dash app can be easily accessed by running runserver.py and navigating to localhost. In a production environment, the Dash app must be deployed to a server.

Dash is written on top of Flask. Hence, deploying a Dash app is exactly the same as deploying a Flask app. Refer to Flask's Deployment Guide for more details.

Note the following:

While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well.

A WSGI server should be used instead. Simple-to-use, affordable solutions include PythonAnywhere and Heroku.

Project Structure

.
├── LICENSE
├── README.md
├── classifier
│   ├── data
│   │   ├── x_test.npy
│   │   ├── x_train.npy
│   │   ├── y_test.npy
│   │   └── y_train.npy
│   ├── emails.csv
│   ├── exec.py
│   ├── metrics
│   │   └── 20200925163152_plot.png
│   ├── models
│   │   └── 20200925163152_spam_classifier.h5
│   ├── predict_input.py
│   └── process_data.py
├── dash_app
│   ├── app.py
│   ├── assets
│   │   └── main.css
│   ├── bm_alg.py
│   ├── callbacks.py
│   ├── data.py
│   ├── emails.csv
│   ├── index.py
│   ├── predict.py
│   ├── routes.py
│   ├── stats.py
│   └── temp
│       └── app_files
│           ├── output.csv
│           └── stats_output.html
├── requirements.txt
└── runserver.py

Classifier

process_data.py: Processes data from the dataset, removing irrelevant data in the spam text including punctuation, stop words, hyperlinks, etc. and representing the data as a feature matrix that allows the model architecture to effectively extract relationships between the sequence data and resulting label.
exec.py: Trains and saves the classifier model.
predict_input.py: Integration with the Dash Web GUI. Given a user input, predict whether the email is spam.

Dash App

app.py: Defines the Dash application.
runserver.py: Runs the application defined above. Integrates all routes and callbacks for the Dash application.
routes.py: Specifies the routes (URLs) of the application. The application is multi-paged, but the browser does not need to refresh. The content is dynamically updated here. Also defines the functions and request handlers to serve local files, allowing the user to download exported results.
index.py: Layout for '/' (homepage)
stats.py: Layout for '/stats'
predict.py: Layout for '/predict'
callbacks.py: Defines callback functions for the Dash app. This is how the application is able to dynamically update its content (tables, graphs, etc.) based on the user input (search bar, dropdown, etc.).
data.py: Extracts data from the dataset / exports data from the dataset using pandas.
bm_alg.py: The search algorithm. We use the Boyer-Moore algorithm. The precomputation time complexity is O(m+k), where k is the size of the alphabet. The time complexity for the searching phase is O(n).

Contributors

Zhang Zeyu
Jared Marc Song Kye-Jet
Lee Zhan Hong
Ivan Ng Say Mun
Bill Eng De Xian
Nicholas Ooi Jun Wei

License

Use of this project is governed by the MIT License.

Plagiarism

This project is an assignment submission in partial fulfillment of the Singapore Institute of Technology (SIT) module ICT1002 Programming Fundementals.

The University's policy on copying does not allow students to copy software as well as assessment solutions from another person.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classifier

classifier

dash_app

dash_app

.DS_Store

.DS_Store

.gitignore

.gitignore

LICENSE

LICENSE

LOGO.png

LOGO.png

README.md

README.md

requirements.txt

requirements.txt

runserver.py

runserver.py

Repository files navigation

Spam or Ham

About

Usage

Running in a Development Environment

Debug Mode

Deploying to a Production Environment

Project Structure

Classifier

Dash App

Contributors

License

Plagiarism

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
classifier		classifier
dash_app		dash_app
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
LOGO.png		LOGO.png
README.md		README.md
requirements.txt		requirements.txt
runserver.py		runserver.py

License

zeyu2001/ICT1002-Python

Folders and files

Latest commit

History

Repository files navigation

Spam or Ham

About

Usage

Running in a Development Environment

Debug Mode

Deploying to a Production Environment

Project Structure

Classifier

Dash App

Contributors

License

Plagiarism

About

Topics

Resources

License

Stars

Watchers

Forks

Languages