GitHub - mmenchu/CheGuevaraTools: Utility Scripts in Python

#CheGuevaraTools

Scraper

A multithreaded scraper with support for HTTP proxies and custom headers.

Download and install dependencies pip install -r requirements.txt or Install as library pip install git+https://github.com/mmenchu/CheGuevaraTools.git
Download HydeMyAss Proxy Zip files to a tmp folder python proxy_zip_downloader.py gmail_usrname gmail_pass path_tmp_fldr
Run the example. This will curl ~900 URLS using around 300 proxies. python example.py

Notes

Eventhough a custom scraper can be passed to ThreadedServiceFetcherManager it's probably best to just return the html without parsing it and scrape off the data in a different script using multiple processes.
Run ulimit -n 1024

Miguel Menchu mmenchu@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
README.md		README.md
Scraper.py		Scraper.py
__init__.py		__init__.py
example.py		example.py
example2.py		example2.py
example3.py		example3.py
proxy_loader.py		proxy_loader.py
proxy_zip_downloader.py		proxy_zip_downloader.py
requirements.txt		requirements.txt
service_fetcher.py		service_fetcher.py
threaded_service_fetcher.py		threaded_service_fetcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scraper.py

Scraper.py

init.py

init.py

example.py

example.py

example2.py

example2.py

example3.py

example3.py

proxy_loader.py

proxy_loader.py

proxy_zip_downloader.py

proxy_zip_downloader.py

requirements.txt

requirements.txt

service_fetcher.py

service_fetcher.py

threaded_service_fetcher.py

threaded_service_fetcher.py

Repository files navigation

Scraper

Notes

About

Releases

Packages

Languages

mmenchu/CheGuevaraTools

Folders and files

Latest commit

History

Repository files navigation

Scraper

Notes

About

Resources

Stars

Watchers

Forks

Languages