Skip to content

School projet for malicious URLs detection with dataset comparison

Notifications You must be signed in to change notification settings

Ben-Nupa/malicious_urls_detection

Repository files navigation

Malicious URL detection with datasets comparison

Python

Project for the security course at CentraleSupelec, CS track.

Quickstart

On Windows

Use Python 3.4, 3.5 or 3.6 (compatibility with Tensorflow)

python -m venv venv
venv\Scripts\activate.bat
pip install -r requirements.txt

Examples

Check the file predict.py.

Datasets

Dataset 1: Unbalanced dataset with 80% safe URLs, 20% malicious - repeated URLs

Dataset 2: Balanced dataset

Dataset 3: Dated malicious URLs, built from PhishTank and Malware Domains Blocklist

Credits

The code here is based on the work of the following people:

  • Hillary Sanders and Joshua Saxe - Garbage In, Garbage Out How purportedly great ML models can be screwed up by bad data - paper, slides
  • Joshua Saxe and Konstantin Berlin - eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys - paper and their github

About

School projet for malicious URLs detection with dataset comparison

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages