###What is the ‘file scanning framework’?###
The FSF is a modular solution that enables analysts to extend the utility of the Yara signatures they write and define actionable intelligence within a file. This is accomplished by recursively scanning a file and looking for opportunities to extract file objects using a combination of Yara signatures (to define opportunities) and programmable logic (to define what to do with the opportunity). The framework allows you to build out your intelligence capability by empowering you to apply observations wrought out of the analytical process…
Okay that’s a mouthful – but think about it – if you see that some pattern (maybe a string or a byte sequence) that represents some concept or behavior; through the use of the framework, you are positioned to capture that observation and apply it to certain file types that meet your criteria.
Some examples might be:
- Uncompressing ZIP files and scanning their contents.
- Decoding a malware config file that matches a specific signature, then parsing the meta data.
- General metadata enrichment for any file type.
- Logging the compile time for any EXE
- Logging the author field for office documents
- So much more...
You can extend and define what’s important by writing modules that expose pieces of metadata that inform analysis and expose new sub objects of a file! These sub objects are recursively scanned through the same gauntlet, further enhancing both Yara and module utility.
###If we alert on a signature, how will we know?###
This decision is left up to you since there are many ways to do this. One suggestion might be to aggregate and index the scan.log data using something like Splunk or with an ELK Stack. You can then build your alerting into the capability.
###Is there a way I can take action on a specific rule hit from within the FSF? Like print out metadata for certain file types?###
This is precisely what modules are for! Module development driven by analyst observations is a cornerstone of the FSF!
###This is pretty cool – but I don’t really know that much about Yara?###
Check out the Yara official documentation for more information and examples.
###What are the tools limitations?###
- Since we recursively process objects, a
MIN_DEPTH
configurable value is enforced. - There is a
TIMEOUT
value that is imposed on each module run that may not be exceeded or the program terminates.
###Is there a general process flow that can help me understand what's going on?###
Yes. For a complete process flow, refer to the graphic found at docs/FSF Process.png. You may also find a graphic depicting a high level overview helpful as well at [docs/FSF Overview.png] (https://github.com/EmersonElectricCo/fsf/blob/master/docs/FSF%20Overview.png)
###Is there helpful documentation on how to write modules?###
Absolutely. Check out the docs/modules.md for a great primer on how to get started.
###How does this scale up if I want to 'scan all the things'?###
The server is parallelized and supports running multiple jobs at the same time. As an example, I've provided one possible way you can accomplish this by integrating with Bro, extracting files, and sending them over to the FSF server. You can find this at the bottom of docs/modules.md under the heading 'Automated File Extraction'.
Some key advantages to Bro integration are:
- Ability to direct files to a given FSF scanner node on a per sensor basis
- Use of the Bro scripting language to help optimize inputs, some examples might include:
- Limit sending of a file we've already seen for a certain time interval to avoid redundancy (based on MD5, etc)
- Limit the size of the file you extracting if desired
- Control over MIME types you care to pass on to FSF
###What if I want to do load balancing across several FSF servers?###
You can easily integrate different load balancing solutions with FSF if you wish. Doing so, combined with the servers parallel processing for each request has many performance and reliability benefits. It also gives you the flexibility to do load balancing the way you want to, like using equal distribution, grouping, fail over, some combination and more...
For example, you can use the popular utility Balance to configure simple load balancing between FSF nodes with one simple command.
balance -f 5800 10.0.3.5 10.0.3.6
The above tells balance to run in the foreground on port 5800, and equally distribute requests between the two hosts specified (10.0.3.5 and 10.0.3.6). By default, the requests will be forwarded on port 5800 as well unless otherwise specified. Now we can just point our FSF clients to our load balancer and let it do the work for us.
Of course, you can use a different load balancing solution you'd like, this is just a quick example. You can even specify multiple FSF servers/balancers using the client config file if desired. When doing this, the FSF server chosen for the request is done at random allowing for some rudimentary balancing.
###How can I get access to the subobjects that are recursively processed?###
Ah, so are you tired of using hachoir-subfile
+ dd
to carve out files during static analysis? Or perhaps running unzip
or unrar
to get decompressed files, upx -d
to get unpacked files, or OfficeMalScan
to get macros over and over is getting old?
Well you can certainly use FSF to do the heavy lifting if you'd like. It incorporates the components that make the above tools so helpful into the framework. For other use cases, all you you need is to ensure the intelligence to do what you want is built into the framework (Yara + Module)! Several open source modules included with the package help with this. Just use the --full option when invoking the client and all the subobjects will collect in a new directory.
Word of caution however, make sure you understand how to do it the hard way first!
fsf_client.py macro_test --full
...normal report information...
Subobjects of macro_test successfully written to: fsf_dump_1446676465_6ba593d8d5defd6fbaa96a1ef2bc601d
###Okay I think I understand, but I'd like visual representation on what a 'report' looks like?###
Take a look a the following graphic in docs/Example Test.png. That represents the file test.zip
which may be found in docs/Test.zip. That file, when recursively processed using FSF outputs what's found in docs/Test.json.
Each object within this file represents an opportunity to collect/enrich intelligence to drive more informed detections, adversary awareness, correlations, and overall analytical tradecraft.
###There's a lot of JSON output here... What tools exist to help me interact with this data effectively over the command line?###
JQ is a great utility to help work with JSON data. You might find yourself wanting to filter out certain modules when reviewing FSF JSON output for intel gain. Please refer to the docs/JQ_Examples.md, for some helpful 'FSF specific' examples to accommodate such inquiries. I'd also suggest taking a peek at the JQ Cookbook for more great examples.
FSF has been tested to work successfully on CentOS and Ubuntu distributions.
Please refer to docs/INSTALL.md for a detailed, step-by-step guide on how to get started with either platform.
Alternatively, you can check out our Dockerfile if you'd like.
Check your configuration settings
- Server-side - In [fsf-server/conf/conf.py] (https://github.com/EmersonElectricCo/fsf/blob/master/fsf-server/conf/config.py)
- Make sure you are pointing to your master yara signature file using the full path. See [fsf-server/yara/rules.yara] (https://github.com/EmersonElectricCo/fsf/blob/master/fsf-server/yara/rules.yara)
- Set the logging directory; make sure it exists and ensure you have permissions to write to it
- In fsf-server, start up the server using
./main.py start
and it will daemonize - Client-side - In fsf-client/conf/conf.py
- Point to your server(s) being used to scan files
- Submit a file with
fsf_client.py <PATH>
, you can use wildcard for scanning all of the files in a directory