GAD_MLSP2015

MAIN FILE:

get_top_anomaly.py

GOAL:

extract prioritized clusters of anomalous samples and their salient (continuous, real-valued) feature subsets from DATA

Based on our MLSP paper:

Detecting Clusters of Anomalies on Low-Dimensional Feature Subsets with Application to Network Traffic Flow Data

In that paper, public-domain TCPdump traces of background web enterprise network packet traffic and botnet command and control traffic that also uses port 80 (two different domains) were processed by wireshark to create netflows/sessions from which the bidirectional sequence of packet sizes were extracted to form the DATA used in this case study. The anomaly detection framework is fully unsupervised in that the botnet activity to be detected is only resident in the (unlabelled) test set, together with a fraction of the (nominal) background traffic samples (multifold cross validation was performed). We therefore did not use the categorical features from the headers. Also, the packet payloads were unavailable (public-domain datasets), and the timing features were not used (because the background and botnet traces were recorded in different domains).

Paper Links:

arvix:

http://arxiv.org/pdf/1511.01047.pdf

MLSP'15 IEEE Xplore:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7324326&tag=1

Also see: actively learning to distinguish suspicious from innocuous anomalies in a batch of vehicle tracks

Active learning paper in spie for 2 classes to distinguish suspicious from innocuous anomalies.

Paper Link:

http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1881997

Pending journal paper that describes a both active and semi-supervised framework for >2 (generally novel/rare) classes.

###################################################

Variables:

TRAIN:

a .txt file, 2-d array with (n_sample,n_feature)
used to train the background model (GMMs, MIs)

DATA:

a .txt file, 2-d array with (n_sample,n_feature)
extract group anomalies from DATA

LABEL:

each DATA sample's label

normal_cat:

indicate the normal label

M_max:

max number of components each gmm can attain

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
GAD_module.py		GAD_module.py
README.md		README.md
get_top_anomaly.py		get_top_anomaly.py
gmm_module.py		gmm_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GAD_module.py

GAD_module.py

README.md

README.md

get_top_anomaly.py

get_top_anomaly.py

gmm_module.py

gmm_module.py

Repository files navigation

GAD_MLSP2015

###################################################

About

Releases

Packages

Languages

zhicongqiu/GAD_MLSP2015

Folders and files

Latest commit

History

Repository files navigation

GAD_MLSP2015

###################################################

About

Resources

Stars

Watchers

Forks

Languages