GitHub - urwithajit9/fluxbuster: Automatically exported from code.google.com/p/fluxbuster

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
lib		lib
resources		resources
src		src
LICENSE		LICENSE
README		README
build.xml		build.xml
fluxbuster.sh		fluxbuster.sh

Repository files navigation

[Introduction]

Fluxbuster is designed to cluster and classify candidate fast flux networks from 
SIE DNS traffic data.  Data is read from timestamped compressed data files
and stored in a relational database.  The clustering process uses an 
agglomerative hierarchical clustering algorithm.  Classification involves a 
pre-constructed decision tree.

[Compilation]

Build Requirements:
1. Java JDK 1.6 or greater
2. ant

To compile Fluxbuster from the directory containing the build.xml file execute 
the following command:

ant build

This will build all of the class files and place them into the bin directory.

To clean up the build from the same directory execute the following command:

ant clean

NOTE: all of the libraries necessary to build and execute Fluxbuster are 
included in the lib directory.

[Documentation]

The javadocs for Fluxbuster are located in the doc directory.

They can be generated from the source code by executing:

ant doc

To clean up the javadocs execute:

ant clean-doc

[Installation]

As Fluxbuster only currently supports PostgreSQL, PostgreSQL version 8.4 or
greater should be installed to your database host.  After this is accomplished 
complete the following steps to install the Fluxbuster database.

1. Create a database user to own the Fluxbuster database.  If Fluxbuster will be
installed on the same host as the database the Fluxbuster user would only need 
local access to the database, otherwise the user must be given permission to 
access the database from the host running Fluxbuster.

2. Create a database to house the Fluxbuster db schema.

3. Grant your database user ownership of the newly created database.

4. As the Fluxbuster user execute the fluxbuster_schema.sql file found in the 
resources directory on the Fluxbuster database.  This will create the tables, 
indexes, etc. required by Fluxbuster.

[Configuration]

Fluxbuster is configured through several .properties files in the bin directory
once the program has been built. What follows is a description of each of the
attributes in the .properties files. 

fluxbuster.properties

	SELECTED_CFD_FILE : The path to a list of domains that should be clustered
		regardless of their candidate score.  If this is properties is not 
		specified it is ignored.
	
	GOOD_CANDIDATE_THRESHOLD : The candidate score threshold.  Those candidate 
		domains with a candidate score greater than the threshold are considered
		for clustering.  This value should be between 0.0 and 1.0
		ex. 0.5
	
	MAX_CANDIDATE_DOMAINS : The total number of candidate domains to cluster.
		ex. 1000
	
	DIST_MATRIX_MULTITHREADED : Should the calculation of the distance matrix be
		multithreaded.  Valid values are 'true' or 'false'
		
	DIST_MATRIX_NUMTHREADS : The number of threads to use in multithreaded 
		distance matrix calculation.  This value must be a positive integer. If
		DIST_MATRIX_MULTITHREADED is set to 'false' this value is ignored.
		
	LINKAGE_TYPE :  The linkage type to used during hierarchical clustering.  
		Valid values are 'Single' or 'Complete'
		
	MAX_CUT_HEIGHT : The maximum cut height used during hierarchical clustering.
		The value should be between 0.0 and 1.0.
		ex. 0.75
		
	CANDIDATE_FLUX_DIR : The directory containing the SIE source files.

	DBINTERFACE_CONNECTINFO : The JDBC connection string to the fluxbuster database.  
		ex. jdbc:postgresql://host.example.com:54321/fluxbuster?user=sample&password=secret
		
	DBINTERFACE_DRIVER : The full class name of the JDBC driver class to use.
		ex. org.postgresql.Driver
		
	DBINTERFACE_CLASS : The full class name of the DBInterface implementation to
		use.
		ex. edu.uga.cs.fluxbuster.db.PostgresDBInterface
	
	The following are options related BoneCP.  In most cases the default options
	will suffice.  For further information about each option see:
	http://jolbox.com/bonecp/downloads/site/apidocs/com/jolbox/bonecp/BoneCPConfig.html
			
	DBINTERFACE_PARTITIONS : BoneCPConfig.setPartitionCount
	
	DBINTERFACE_MIN_CON_PER_PART : BoneCPConfig.setMinConnectionsPerPartition
	
	DBINTERFACE_MAX_CON_PER_PART : BoneCPConfig.setMaxConnectionsPerPartition
	
	DBINTERFACE_RETRY_ATTEMPTS : BoneCPConfig.setAcquireRetryAttempts
	
	DBINTERFACE_RETRY_DELAY : BoneCPConfig.setAcquireRetryDelayInMs
	
commons-logging.properties

	org.apache.commons.logging.Log : Sets the logging implementation class.  By
		default this is set to 'org.apache.commons.logging.impl.SimpleLog' which
		can be configured in the simplelog.properties file
		
simplelog.properties

	org.apache.commons.logging.simplelog.defaultlog : the default minimum logging
		level
		
	org.apache.commons.logging.simplelog.showdatetime : display a timestamp
		for each log entry
		
	org.apache.commons.logging.simplelog.showlogname : Set to true if you want 
		the Log instance name to be included in output messages.

[Usage]

You can execute Fluxbuster via the fluxbuster.sh bash script.

usage: fluxbuster.sh

If none of the options g, f, s, c are specified then the program will execute as
if all of them have been specified.  Otherwise, the program will only execute
the options specified.

-?,--help                Print help message.
-c,--classify-clusters   Classify clusters. (Optional)
-d,--start-date <arg>    The start date of the input data.  Should be in
                         yyyyMMdd format.
-e,--end-date <arg>      The end date of the input data.  Should be in yyyyMMdd
                         format.
-f,--calc-features       Calculate cluster features. (Optional)
-g,--generate-clusters   Generate clusters. (Optional)
-s,--calc-similarity     Calculate cluster similarities. (Optional)
		
To access the results of Fluxbuster you can use the following SQL examples to 
query the Fluxbuster database.

1. Get all the domains and to which cluster they belong for a the run on 
2010-11-13.

	select
		clusters.cluster_id,
		domains.domain_name,
		domains.second_level_domain_name
	from
		clusters
			join
		domains
			on
		clusters.log_date = domains.log_date
		and clusters.domain_id = domains.domain_id
	where
		clusters.log_date = '2010-11-13'
		
2. Get all of the cluster features for each cluster for the run on 2010-11-13.

	select
		cluster_feature_vectors.*
	from
		clusters
			join
		cluster_feature_vectors
			on
		clusters.log_date = cluster_feature_vectors.log_date
		and clusters.cluster_id = cluster_feature_vectors.cluster_id
	where
		clusters.log_date = '2010-11-13'
		
3. Get all of the classified clusters for the run on 2010-11-15.

	select
		distinct(cluster_classes.*)
	from
		clusters
			join
		cluster_classes
			on
		clusters.log_date = cluster_classes.log_date
		and clusters.cluster_id = cluster_classes.cluster_id
	where
		clusters.log_date = '2010-11-15'