Crawl Tor hidden services to show the type of content it's in the hidden network.
There are aggregators but no a whole list of services. ahmia.fi indexed 4k onions We need to have an overview of the hidden net.
Categorize for human rights and
Other info:
- average life of an hidden service is very short (1, 2 weeks)
- check the average life
- check the language distribution
- how to distribute the crawler
- how do we define what are human rights enabler websites
- description.json
- check for directory pages
- qualitative analysis
URLs gathering:
- The Hidden Wiki
- Onion Url Repository
- some subreddits
Moar links:
- https://events.ccc.de/camp/2015/wiki/Session:Deep_Graphics
- https://www.youtube.com/watch?v=-oTEoLB-ses
- https://en.wikipedia.org/wiki/List_of_Tor_hidden_services#cite_note-1
The software architecture of this crawler is heavily inspired by "Web Crawling", by Christopher Olston and Marc Najork. To Christopher and Marc: a huge thank you for your work.