CISC850 - MACHINE LEARNING
This is the repository for our project- Malware detection through machine learning
Project Description- We analyze the award winning Kaggle code that successfully classifies malware using various machine learning algorithms and optimize the code to reduce the consumption of memory and execution time of the scripts. Unlike the kaggle team, we follow a parallel processing approach. Our re-written optimized Kaggle Scripts are in the repository. We would like to thank the Kaggle Team for giving us a head start.
Collaborators
Vinit Singh (vinitvs@udel.edu) Abhilash Parthasarthy (abhipart@udel.edu) Mingxing Gong (mingxing.gong@gmail.com)
1.Successfully extracts a small dataset of training and testing data from a large dataset. 2.Successfully exctracts features from this subset of large dataset. 3.Feature engineering is performed and the model is trained to minimize the log loss error, The model correctly classifies the malwares into various families. Further, it solves the overfitting problem. 4.We analyze the relation between the size training data and the time it takes to train the model. 5.We also analyze the relation between the size training data and the log loss.