Skip to content

loryka/spamscope

 
 

Repository files navigation

Overview

SpamScope is an advanced spam analysis tool that use Apache Storm with streamparse to process a stream of mails.

It's possible to analyze about 5 milions of mails (without Apache Tika analisys) for day with a 4 cores server and 4 GB of RAM. If you enable Apache Tika, you can analyze about 1 milion of mails.

Schema topology

Distributed

SpamScope use Apache Storm that allows you to start small and scale horizontally as you grow. Simply add more worker.

Flexibility

You can chose your mails input sources (with spouts) and your functionalities (with bolts). SpamScope come with a tokenizer (split mail in token: headers, body, attachments), attachments and phishing analyzer (Which is the target of mails? Is there a malware in attachment?) and JSON output.

Store where you want

You can build your custom output bolts and store your data in Elasticsearch, Mongo, filesystem, etc.

Build your topology

With streamparse tecnology you can build your topology in Python, add and/or remove spouts and bolts.

Apache 2 Open Source License

SpamScope can be downloaded, used, and modified free of charge. It is available under the Apache 2 license. Donate

Output example

Here an example of raw mail and here the SpamScope analisys output.

Authors

Main Author

Fedele Mantuano (Twitter: @fedelemantuano)

Installation

For more details please visit wiki page.

Clone repository

git clone https://github.com/SpamScope/spamscope.git

Install requirements in file requirements.txt with python-pip:

pip install -r requirements.txt

There is another requirement: Faup. Install faup tool and then python library with:

python setup.py install

Configuration

All details are in conf folder.

From SpamScope v1.1 you can decide to filter mails and attachments already analyzed. If you enable filter in tokenizer section you will enable the RAM database and SpamScope will check on it to decide if mail/attachment is already analyzed or not. If yes SpamScope will not analyze it and will store only the hashes.

Usage

SpamScope comes with two topologies:

  • spamscope_debug
  • spamscope_elasticsearch

and a general configuration file spamscope.conf in conf/ folder.

To run topology for debug:

sparse run --name topology

If you want submit topology to Apache Storm:

sparse submit -f --name topology

Important

It's very importart pass configuration file to commands sparse run and sparse submit. There is an open bug in streamparse:

  • sparse run --name topology -o "spamscope_conf=/etc/spamscope/spamscope.yml"
  • sparse submit -f --name topology -o "spamscope_conf=/etc/spamscope/spamscope.yml"

If you use Elasticsearch output, I suggest you to use Elasticsearch template that comes with SpamScope.

Apache Storm settings

It's possible change the default setting for all Apache Storm options. I suggest for SpamScope these options:

  • topology.tick.tuple.freq.secs: reload configuration of all bolts
  • topology.max.spout.pending: Apache Storm framework will then throttle your spout as needed to meet the topology.max.spout.pending requirement
  • topology.sleep.spout.wait.strategy.time.ms: max sleep for emit new tuple (mail)

For SpamScope I tested these values to avoid failed tuples:

topology.tick.tuple.freq.secs: 60
topology.max.spout.pending: 100
topology.sleep.spout.wait.strategy.time.ms: 10

If Apache Tika is enabled:

topology.max.spout.pending: 10

For submit these options:

sparse submit -f --name topology -o "spamscope_conf=/etc/spamscope/spamscope.yml" -o "topology.tick.tuple.freq.secs=60" -o "topology.max.spout.pending=100" -o "topology.sleep.spout.wait.strategy.time.ms=10"

For more details you can refer here.

Apache Tika

It's possible add to results (for mail attachments) the output of Apache Tika analysis. You should enable it in attachments section. SpamScope use Tika-app JAR with tika-app python library.

Virustotal

It's possible add to results (for mail attachments) Virustotal report. Maybe you need a private API key.

Docker image

It's possible to use a complete Docker image with Apache Storm and SpamScope. Take it here. There are two tags: latest and develop.

Apache Storm

SpamScope

SpamScope Topology

About

Fast Advanced Spam Analysis Tool

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.6%
  • Clojure 0.4%