Skip to content

schladt/malware-feature-vectors

Repository files navigation

malware-feature-vectors

##Introduction :

  • Project contains a collection of functions to manipulate and perform analysis of malware feature vectors

  • From Wikipedia “… a feature vector is an n-dimensional vector of numerical features that represent some object”

  • Malware features numerically represent various aspect of (mostly) dynamic analysis and can be used to determine relational proximity between malware samples and families

  • Disclaimer : This project is very alpha / prototypical. Your milage may vary.

  • For an in-depth overview of the project, you may want to check out my DerbyCon presentation:
    https://www.youtube.com/watch?v=f74w4sOlQ5A

Build Requirements

  • Cuckoo Sandbox setup (linux host) -- cuckoosandbox.org
  • MySql database with the following tables
$ mysql -u <mysql_user> -p <mfv_db_name> < mfv_tables.sql
  • Python 2.7.X PIP
  • Required python packages can be installed with
$ sudo pip install -r requirements.txt
  • config.py must be modified with your credentials -- see config.py.example
  • If using Plotly, you account creds must be stored in ~/.plotly -- see plot.ly for more info

Important file descriptions

  • mfv.py :

    • Core resource of the project. Defines FeatureVector class.
    • Defines functions to manipulate vectors, perform statistical analysis, and display plots
  • examples/autogen_families.py

    • Groups vectors based on shared tags (family, filetype, source)
    • Normalizes vectors and creates “archetypes”
    • Plots all vectors in the families with family archetype
  • examples/plot_archetypes.py

    • Similar to autogen_families.py
    • Uses database instead of creating archetypes
  • examples/best_guess.py

    • Compares test vector to each stored archetypes
    • Normalizes & prunes sample under test to fit archetype
    • Finds euclidean distance form test vector to archetype
    • Suggests likely family based previously calculated distance
  • exmaples/compare_to_archetypes.py

    • Normalizes & prunes sample under test to fit archetype
    • Plots test vector against all archetypes
    • Graphical way to verify the best_guess.py

MISC Helper Scripts

  • add_tags.py - imports tags (i.e. malware family, source, etc) into database
  • create_feature_vectors.py - creates csv of feature vectors using Cuckoo Sandbox REST API
  • add_vectors.py - imports feautre vector csv into database

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages