Skip to content

abhiraw/fpdetective

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fpdetective

A framework for conducting large scale web privacy studies.

Installation

git clone https://github.com/fpdetective/fpdetective.git
cd fpdetective

Then follow instructions for setting up VM to run FPDetective in a virtual machine

Get Started

Command line parameters

Below we give a description of the parameters that are passed to the agents.py module.

  • --index_url: path to the file containing the list of URLs to crawl
  • --stop: index of the url_file where the crawl will stop
  • --start (optional): index of the url_file where the crawl will start
  • --type: the agent can be:
    • lazy: uses phantomjs and visits homepages
    • clicker: uses phantomjs and clicks a number of links
    • chrome_lazy: uses chrome and visits homepages
    • chrome_clicker: uses chromium and clicks a number of links
    • dnt: visits homepages with a DNT header set to 1
    • screenshot: visits homepages and takes a screenshot
  • --max_proc: maximum number of processes that will run in parallel
  • --fc_debug: boolean to set the system environment variable that logs the OS font requests

How to launch a simple crawl

You can use following command to crawl the homepages of Alexa top 100 sites with 10 browsers running in parallel:

  • Change to the FPDetective source directory: (~/fpbase/src/crawler) and run the command:
python agents.py --url_file ~/fpbase/run/top-1m.csv --stop 100 --type lazy --max_proc 10

Once the crawl is finished, you can check the log in run/logs/latest or connect to the DB using Phpmyadmin (the password for the root user is: fpdetective).

Patches for Chromium & PhantomJS browser

You can use following patches to build modified Chromium and PhantomJS browsers from source. Please consult the instructions for further explanation.

About

A framework for conducting large scale web privacy studies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.1%
  • CSS 14.4%
  • JavaScript 10.0%
  • Ruby 4.6%
  • ActionScript 2.7%
  • Shell 2.2%