effect-workflows

DIG workflow processing for the EFFECT project.

Installation

Download and install conda - https://www.continuum.io/downloads. Example for 64 bit linux: a. wget https://repo.continuum.io/archive/Anaconda2-4.4.0-Linux-x86_64.sh b. bash Anaconda2-4.4.0-Linux-x86_64.sh c. source ~/.bashrc
Install conda env - conda install -c conda conda-env
Clone this repo: git clone https://github.com/usc-isi-i2/effect-workflows.git
cd effect-workflows
Create user effect on hdfs. Add folders \user\effect\data and '\user\effect\workflow` and give all users write permission to those folders
Run the install script .\install.sh
To copy CDR data from existing machine, follow instructions in copyCDR.txt
Import all workflows in oozie*.json into you oozie using Hue and schedule the coordinators

NOTE: You should build the environment on the same hardware/os you're going to run the job

Running script to convert CSV,JSON,XML,CDR data into a format that should be used for Karma Modeling

Switch to the effect-env: source activate effect-env
Execute:

python generateDataForKarmaModeling.py --input <input filename> --output <output filename> \
      --format <input format-csv/json/xml/cdr> --source <a name for the source> \
      --separator <column separator for CSV files>

Example Invocations:

python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/nvd/sample/nvdcve-2.0-2003.xml \
          --output nvd.jl --format xml --source nvd


python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/hackmageddon/sample/hackmageddon_20160730.csv \
          --output hackmageddon.jl --format csv --source hackmageddon


python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/hackmageddon/sample/hackmageddon_20160730.jl \
          --output hackmageddon.jl --format json --source hackmageddon

Loading data in HIVE

See hiveQueries.sql for examples
See copyCDR.txt to copy all data from one hive install to another

Running the workflow

The install.sh script will build all jars and files required to run the workflow
cp sparkRunCommands\run_effectWorkflow.sh .\
.\run_effectWorkflow.sh This will load data from HIVE table CDR, apply karma models to it and save the output to HDFS.

To load the data to ES,

cp sparkRunCommands\run_effectWorkflow-es.sh .\
.\run_effectWorkflow-es.sh

Extras

To remove the environment run conda env remove -n effect-env
To see all environments run conda env list

** Run OOZIE workflow from command line - takes in job.properties and workflow.xml

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
ISIToASUTwitterConverter		ISIToASUTwitterConverter
dailyAPIAudits		dailyAPIAudits
gt-analysis		gt-analysis
jars		jars
oozie		oozie
schema-analysis		schema-analysis
scripts		scripts
sparkRunCommands		sparkRunCommands
.gitignore		.gitignore
.hdfscli.cfg		.hdfscli.cfg
EffectCluster.md		EffectCluster.md
ISIClusterInfo.md		ISIClusterInfo.md
LICENSE		LICENSE
README.md		README.md
cdrLoader.py		cdrLoader.py
clean.sh		clean.sh
copyCDR.txt		copyCDR.txt
csvToJson.py		csvToJson.py
effect-cdr-dump.py		effect-cdr-dump.py
effect-emailextractor-workflow.py		effect-emailextractor-workflow.py
effectWorkflow-es.py		effectWorkflow-es.py
effectWorkflow.py		effectWorkflow.py
environment.yml		environment.yml
es_backup.sh		es_backup.sh
generateDataForKarmaModeling.py		generateDataForKarmaModeling.py
hiveQueries.sql		hiveQueries.sql
install.sh		install.sh
make.sh		make.sh
myDig-mapper.py		myDig-mapper.py
nginx-conf.txt		nginx-conf.txt
nginx.conf		nginx.conf
postgresToCDR.py		postgresToCDR.py
pyspark		pyspark
ransomware-workflow.py		ransomware-workflow.py

License

skangaslahti/effect-workflows

Folders and files

Latest commit

History

Repository files navigation

effect-workflows

Installation

Running script to convert CSV,JSON,XML,CDR data into a format that should be used for Karma Modeling

Loading data in HIVE

Running the workflow

Extras

About

Resources

License

Stars

Watchers

Forks

Languages