SpamScope is an advanced spam analysis tool that use Apache Storm with streamparse to process a stream of mails.
It's possible to analyze about 5 milions of mails (without Apache Tika analisys) for day with a 4 cores server and 4 GB of RAM. If you enable Apache Tika, you can analyze about 1 milion of mails.
SpamScope use Apache Storm that allows you to start small and scale horizontally as you grow. Simply add more worker.
You can chose your mails input sources (with spouts) and your functionalities (with bolts). SpamScope come with a tokenizer (split mail in token: headers, body, attachments), attachments and phishing analyzer (Which is the target of mails? Is there a malware in attachment?) and JSON output.
You can build your custom output bolts and store your data in Elasticsearch, Mongo, filesystem, etc.
With streamparse tecnology you can build your topology in Python, add and/or remove spouts and bolts.
SpamScope can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.
Here an example of raw mail and here the SpamScope analisys output.
Fedele Mantuano (Twitter: @fedelemantuano)
For more details please visit wiki page.
Clone repository
git clone https://github.com/SpamScope/spamscope.git
Install requirements in file requirements.txt
with python-pip
:
pip install -r requirements.txt
There is another requirement: Faup. Install faup
tool and then python library with:
python setup.py install
All details are in conf
folder.
From SpamScope v1.1 you can decide to filter mails and attachments already analyzed. If you enable filter in tokenizer
section you will enable the RAM database and
SpamScope will check on it to decide if mail/attachment is already analyzed or not. If yes SpamScope will not analyze it and will store only the hashes.
SpamScope comes with two topologies:
- spamscope_debug
- spamscope_elasticsearch
and a general configuration file spamscope.conf
in conf/
folder.
To run topology for debug:
sparse run --name topology
If you want submit topology to Apache Storm:
sparse submit -f --name topology
It's very importart pass configuration file to commands sparse run
and sparse submit
. There is an open bug in streamparse:
sparse run --name topology -o "spamscope_conf=/etc/spamscope/spamscope.yml"
sparse submit -f --name topology -o "spamscope_conf=/etc/spamscope/spamscope.yml"
If you use Elasticsearch output, I suggest you to use Elasticsearch template that comes with SpamScope.
It's possible change the default setting for all Apache Storm options. I suggest for SpamScope these options:
- topology.tick.tuple.freq.secs: reload configuration of all bolts
- topology.max.spout.pending: Apache Storm framework will then throttle your spout as needed to meet the
topology.max.spout.pending
requirement - topology.sleep.spout.wait.strategy.time.ms: max sleep for emit new tuple (mail)
For SpamScope I tested these values to avoid failed tuples:
topology.tick.tuple.freq.secs: 60
topology.max.spout.pending: 100
topology.sleep.spout.wait.strategy.time.ms: 10
If Apache Tika is enabled:
topology.max.spout.pending: 10
For submit these options:
sparse submit -f --name topology -o "spamscope_conf=/etc/spamscope/spamscope.yml" -o "topology.tick.tuple.freq.secs=60" -o "topology.max.spout.pending=100" -o "topology.sleep.spout.wait.strategy.time.ms=10"
For more details you can refer here.
It's possible add to results (for mail attachments) the output of Apache Tika analysis. You should enable it in attachments
section. SpamScope use Tika-app JAR with tika-app python library.
It's possible add to results (for mail attachments) Virustotal report. Maybe you need a private API key.
It's possible to use a complete Docker image with Apache Storm and SpamScope. Take it here. There are two tags: latest and develop.