Skip to content

vrde/harvey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Goal

Crawl Tor hidden services to show the type of content it's in the hidden network.

There are aggregators but no a whole list of services. ahmia.fi indexed 4k onions We need to have an overview of the hidden net.

Categorize for human rights and

Other info:

  • average life of an hidden service is very short (1, 2 weeks)
  • check the average life
  • check the language distribution
  • how to distribute the crawler
  • how do we define what are human rights enabler websites
  • description.json
  • check for directory pages
  • qualitative analysis

URLs gathering:

Moar links:

The software architecture of this crawler is heavily inspired by "Web Crawling", by Christopher Olston and Marc Najork. To Christopher and Marc: a huge thank you for your work.

About

yet another crawler

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages