Skip to content

thijsh/fpdetective

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fpdetective

A framework for conducting large scale web privacy studies.

Installation

git clone https://github.com/fpdetective/fpdetective.git
cd fpdetective

After that point, you've two options:

  1. Run ./setup.sh to use FPDetective on your computer
  2. Follow instructions for setting up VM to run FPDetective in a virtual machine

Please note that setup.sh will download browsers and other binaries used by FPDetective. This may take while depending on your connection.

Get Started

Command line parameters

Below we give a description of the parameters that are passed to the agents.py module.

  • --index_url: path to the file containing the list of URLs to crawl
  • --stop: index of the url_file where the crawl will stop
  • --start (optional): index of the url_file where the crawl will start
  • --type: the agent can be:
    • lazy: uses phantomjs and visits homepages
    • clicker: uses phantomjs and clicks a number of links
    • chrome_lazy: uses chrome and visits homepages
    • chrome_clicker: uses chromium and clicks a number of links
    • dnt: visits homespages with a DNT header set to 1
    • screenshot: visits homepages and takes a screenshot
  • --max_proc: maximum number of processes that will run in parallel
  • --fc_debug: boolean to set the system environment variable that logs the OS font requests

How to launch a simple crawl

You can use following command to crawl the homepages of Alexa top 100 sites with 10 browsers running in parallel:

  • Change to the FPDetective source directory: (~/fpbase/src/crawler) and run the command:
python agents.py --url_file ~/fpbase/run/top-1m.csv --stop 100 --type lazy --max_proc 10

Once the crawl is finished, you can check the log in run/logs/latest or connect to the DB using Phpmyadmin (the password for the root user is: fpdetective).

Using FPDetective with a VM

You can follow these instructions to set up a VM and use FPDetective independently of the configuration of your operating system:

Patches for Chromium & PhantomJS browser

You can use following patches to build modified Chromium and PhantomJS browsers from source. Please consult the instructions for further explanation.

About

A framework for conducting large scale web privacy studies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 65.2%
  • CSS 14.3%
  • JavaScript 11.0%
  • Ruby 4.5%
  • ActionScript 2.7%
  • Shell 2.3%