Malware category classification using machine learning algorithms (Random Forest, XGBoost, k-NN, Naive Bayes and MLP).
Python 3.6.8 was used during this project.
pip install -r requirements.txt
The n-grams implementation used some code from the following repository: https://github.com/kaanege/malware
The dataset used for this project can be found via: https://github.com/naisofly/Static-Malware-Analysis
-
Set input data directories for both feature extraction methods (common.py for n-grams and feature_extraction.py for header analysis)
-
If a different dataset is being used than the one above, then any new subdirectories must be specified in feature_extraction.py.
-
Run main.sh (byte_features_extractor.py requires the following format)
python3 src/n-grams/byte_features_extractor.py [ngram e.g. 2gram] [number of cores on your machine]