Skip to content

pombredanne/fsf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File Scanning Framework (FSF)

Introduction

###What is the ‘file scanning framework’?###

The FSF is a modular solution that enables analysts to extend the utility of the Yara signatures they write and define actionable intelligence within a file. This is accomplished by recursively scanning a file and looking for opportunities to extract file objects using a combination of Yara signatures (to define opportunities) and programmable logic (to define what to do with the opportunity). The framework allows you to build out your intelligence capability by empowering you to apply observations wrought out of the analytical process…

Okay that’s a mouthful – but think about it – if you see that some pattern (maybe a string or a byte sequence) that represents some concept or behavior; through the use of the framework, you are positioned to capture that observation and apply it to certain file types that meet your criteria.

Some examples might be:

  • Uncompressing ZIP files and scanning their contents.
  • Decoding a malware config file that matches a specific signature, then parsing the meta data.
  • General metadata enrichment for any file type.
  • Logging the compile time for any EXE
  • Logging the author field for office documents
  • So much more...

You can extend and define what’s important by writing modules that expose pieces of metadata that inform analysis and expose new sub objects of a file! These sub objects are recursively scanned through the same gauntlet, further enhancing both Yara and module utility.

###If we alert on a signature, how will we know?###

This decision is left up to you since there are many ways to do this. One suggestion might be to aggregate and index the scan.log data using something like Splunk or with an ELK Stack. You can then build your alerting into the capability.

###Is there a way I can take action on a specific rule hit from within the FSF? Like print out metadata for certain file types?###

This is precisely what modules are for! Module development driven by analyst observations is a cornerstone of the FSF!

###This is pretty cool – but I don’t really know that much about Yara?###

Check out the Yara official documentation for more information and examples.

###What are the tools limitations?###

  • Since we recursively process objects, a MIN_DEPTH configurable value is enforced.
  • There is a TIMEOUT value that is imposed on each module run that may not be exceeded or the program terminates.

###Is there a general process flow that can help me understand what's going on?###

Yes. For a complete process flow, refer to the graphic found at docs/FSF Process.png. You may also find a graphic depicting a high level overview helpful as well at [docs/FSF Overview.png] (https://github.com/EmersonElectricCo/fsf/blob/master/docs/FSF%20Overview.png)

###Is there helpful documentation on how to write modules?###

Absolutely. Check out the docs/modules.md for a great primer on how to get started.

###How does this scale up if I want to 'scan all the things'?###

The server is parallelized and supports running multiple jobs at the same time. As an example, I've provided one possible way you can accomplish this by integrating with Bro, extracting files, and sending them over to the FSF server. You can find this at the bottom of docs/modules.md under the heading 'Automated File Extraction'.

Some key advantages to Bro integration are:

  • Ability to direct files to a given FSF scanner node on a per sensor basis
  • Use of the Bro scripting language to help optimize inputs, some examples might include:
  • Limit sending of a file we've already seen for a certain time interval to avoid redundancy (based on MD5, etc)
  • Limit the size of the file you extracting if desired
  • Control over MIME types you care to pass on to FSF

###What if I want to do load balancing across several FSF servers?###

You can easily integrate different load balancing solutions with FSF if you wish. Doing so, combined with the servers parallel processing for each request has many performance and reliability benefits. It also gives you the flexibility to do load balancing the way you want to, like using equal distribution, grouping, fail over, some combination and more...

For example, you can use the popular utility Balance to configure simple load balancing between FSF nodes with one simple command.

balance -f 5800 10.0.3.5 10.0.3.6

The above tells balance to run in the foreground on port 5800, and equally distribute requests between the two hosts specified (10.0.3.5 and 10.0.3.6). By default, the requests will be forwarded on port 5800 as well unless otherwise specified. Now we can just point our FSF clients to our load balancer and let it do the work for us.

Of course, you can use a different load balancing solution you'd like, this is just a quick example. You can even specify multiple FSF servers/balancers using the client config file if desired. When doing this, the FSF server chosen for the request is done at random allowing for some rudimentary balancing.

###How can I get access to the subobjects that are recursively processed?###

Ah, so are you tired of using hachoir-subfile + dd to carve out files during static analysis? Or perhaps running unzip or unrar to get decompressed files, upx -d to get unpacked files, or OfficeMalScan to get macros over and over is getting old?

Well you can certainly use FSF to do the heavy lifting if you'd like. It incorporates the components that make the above tools so helpful into the framework. For other use cases, all you you need is to ensure the intelligence to do what you want is built into the framework (Yara + Module)! Several open source modules included with the package help with this. Just use the --full option when invoking the client and all the subobjects will collect in a new directory.

Word of caution however, make sure you understand how to do it the hard way first!

fsf_client.py macro_test --full
...normal report information...
Subobjects of macro_test successfully written to: fsf_dump_1446676465_6ba593d8d5defd6fbaa96a1ef2bc601d

###Okay I think I understand, but I'd like visual representation on what a 'report' looks like?###

Take a look a the following graphic in docs/Example Test.png. That represents the file test.zip which may be found in docs/Test.zip. That file, when recursively processed using FSF outputs what's found in docs/Test.json.

Each object within this file represents an opportunity to collect/enrich intelligence to drive more informed detections, adversary awareness, correlations, and overall analytical tradecraft.

###There's a lot of JSON output here... What tools exist to help me interact with this data effectively over the command line?###

JQ is a great utility to help work with JSON data. You might find yourself wanting to filter out certain modules when reviewing FSF JSON output for intel gain. Please refer to the docs/JQ_Examples.md, for some helpful 'FSF specific' examples to accommodate such inquiries. I'd also suggest taking a peek at the JQ Cookbook for more great examples.

Installation

FSF has been tested to work successfully on CentOS and Ubuntu distributions.

Please refer to docs/INSTALL.md for a detailed, step-by-step guide on how to get started with either platform.

Alternatively, you can check out our Dockerfile if you'd like.

Setup

Check your configuration settings

About

File Scanning Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%