Skip to content

RonakSumbaly/Malware-Classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Malware Classification

MALICIA - Malware Classification - CS260 Machine Learning Algorithms Project

##Background

The continuous rise in Malware attacks have led to the development of many Mal- ware Detection Systems which use several techniques to identify known and un- known malwares. However most of these systems identify based on some previ- ously known malware ”signature”. In order to escape detection the authors of mal- ware have started obfuscating the code. This project presents a technique which uses Machine Learning to classify different malwares into their respective families. The main reason behind using Machine Learning is that while a code can be obfuscated using techniques like Garbage Code Insertion and Instruction Permutation, since at the lower level a malware, belonging to one family, performs similar functions it has to generate similar opcodes and similar patterns.

##Contents

  • Scripts - source code
  • Graphs - graphs generated for project
  • Poster - final poster presentation
  • Report - final report

##Methodology

  • Iteration 1 - Modeling based on Feature Extraction
  • Iteration 2 - Modeling after Feature Selection & Parameter Tuning for Dimensionality Reduction
  • Iteration 3 - Interpret Best Features in Model Building Process

##Concluding Remarks

We successfully classified the malwares into their respective families with a maximum accuracy of 91% achieved for Random Forest. We achieved these results with minimal overhead of the algo- rithms used to extract, select and classify the features. The methodology can be used to classify the malwares into their respective families, even if obfuscation and polymorphism has been employed to change the look and feel of the malware.

About

Machine Learning Algorithms CS-260 project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%