Python scripts to analyze an authentication dataset.
The authentication dataset from LANL (http://csr.lanl.gov/data/auth/) provides a valuable benchmark dataset for researchers in cybersecurity and/or graphs/networks.
We recommend using the Anaconda Python 3 distribution.
We begin by generating two files from the original dataset:
time_secs_binary_f32.dat
- a binary file containing just the time (secs) data (32-bit values)auth_graph_adjlist.dat
- an ASCII file containing the global graph (as an adjacency list)
You have two choices for obtaining these two files. The script create_time_graph_files.py
will
generate both of them. However, it took about 8 hours on a laptop. The other choice is to simply
download the graph file from http://trustedci.org/data and generate
the other file using create_time_file.py
(which takes just a few minutes).
$ ipython --matplotlib
Python 3.4.2 |Anaconda 2.1.0
...
Using matplotlib backend: MacOSX
In [1]: %run create_time_file
...
In [2]: %run histo_time
Interactive matplotlib window with pan, zoom, rubberband buttons
In [3]: %run readG_draw
After some time, the full graph will be plotted (below, for what it's worth). You can then interactively pan and zoom in on regions of interest.
Global, static authN graph
In [4]: %run readG_hub_subgraph
A hub as a subgraph