Skip to content

Code release for: Cookies that give you away: The surveillance implications of web tracking

Notifications You must be signed in to change notification settings

englehardt/cookies-that-give-you-away

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cookies That Give You Away: The Surveillance Implications of Web Tracking

This is the public code release for our WWW 2015 paper. You should also check out the paper, the presentation, and the data.

Data Collection

The measurements were taken on three Amazon EC2 instances using OpenWPM v0.1, which is included in this repo.

  • run_crawl.py - Run a specific crawl, settings should be changed here for each configuration. Only a single configuration from the paper is included here.
  • run_network_measurment.py / get_dns.py / get_traceroute.py - Run after the crawl, on the same instance. This will do DNS lookups for each unique hostname seen during the crawl and run a traceroute to each.
  • make_profiles.py - Create Alexa profiles by randomly subsampling the respective top alexa sites from alexa_top_500_{IE,JP,US}.txt.
  • make_full_list.py - Create union_of_sites.txt, a list of sites to feed into synchronized crawls for ID detection.
  • profiles - Contains the 25 AOL profiles used in the paper, as well as three Alexa models as pickled Python objects.
  • automation - OpenWPM v0.1

Data Analysis

  • create_id_dict.py / cookie_util.py / extract_cookie_ids.py - Will extract ID cookies using two SQLite databases created through a synchronized crawl, as described in Section 4.5 of the paper.
  • create_graph.py - Builds cookie linking graph based on parameters set in generate_samples(), as described in section 4.6 of the paper.
  • db_postprocessing.py / haversine.py - Adds several columns to the crawl databases, including the geocheck described in Section 4.4 of the paper.
  • build_cookie_table.py / Cookie.py - Parses HTTP Request/Response headers to pull out cookies. Integrated into the more recent releases of OpenWPM.
    • NOTE: Cookie.py is included in the python standard library, but its parsing rules are nowhere near what is used in practice. The version here is heavily modified. I recommend using cookies.py, which is based on RFC 6265.
  • identity_parser.py - parses and prints statistics on identity leakers given in identity_leaks.txt

Data

About

Code release for: Cookies that give you away: The surveillance implications of web tracking

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published