Skip to content

public bad code that crawls tor for terrible homemade spaghetti analytics | "Great repository names are short and memorable. Need inspiration? How about urban-fiesta."

License

Notifications You must be signed in to change notification settings

general-programming/torspider

Repository files navigation

torspider

It does things that crawl Tor.

Initial ideas inspired by terrible jokes on Discord about Tor analytics. Lots of help with not reinventing the code for the crawling wheel comes from this crawler.

Licence is AGPL.

Notes

  • docker-compose run --rm spider python init_db.py - Init the DB
  • docker-compose up --scale spider=4 -d brings some nice multispider crawling
  • Rebloom is a required Redis module for duplicate URL filtering.
  • It is assumed that POSTGRES_URL is a bouncer that does its own pooling such as pgbouncer/pgpool.
  • Postgres MUST be the database due to Postgres specific features.

About

public bad code that crawls tor for terrible homemade spaghetti analytics | "Great repository names are short and memorable. Need inspiration? How about urban-fiesta."

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published