Got a shitload of logs? Septic Tank might be the tool for you.
Septic Tank is a pipeline based data processor. It is written in python, is fast, and has a low memory footprint. Each pipe in a pipeline has a very specific function. Pipes may be put together in a pipeline in any way you like.
git clone git://github.com/jbruce12000/septic-tank.git
cd septic-tank
virtualenv vseptic_tank
source vseptic_tank/bin/activate
pip install -r requirements.txt
cd septic_tank/
./make_sample_log.py | ./parse_sample_log.py |more
Inputs are used to get data into a pipeline:
stdin
file
zeromq
dirwatcher
Filters and parsers are used to modify the data in the pipeline in some way:
regular expression parser
date filter
grep / reverse grep filter
remove field filter
lowercase filter
add fields filter
Outputs are used to put data into some system outside the pipeline:
stdout
json
zeromq
solr
never put the same pipe in two different pipelines
to run all tests:
export PYTHONPATH=~/septic_tank/septic_tank/
cd ~/septic_tank/septic_tank/tests/
python -m unittest discover