#CheGuevaraTools
A multithreaded scraper with support for HTTP proxies and custom headers.
-
Download and install dependencies
pip install -r requirements.txt
or Install as librarypip install git+https://github.com/mmenchu/CheGuevaraTools.git
-
Download HydeMyAss Proxy Zip files to a tmp folder
python proxy_zip_downloader.py gmail_usrname gmail_pass path_tmp_fldr
-
Run the example. This will curl ~900 URLS using around 300 proxies.
python example.py
-
Eventhough a custom scraper can be passed to ThreadedServiceFetcherManager it's probably best to just return the html without parsing it and scrape off the data in a different script using multiple processes.
-
Run
ulimit -n 1024
Miguel Menchu mmenchu@gmail.com