Skip to content

This is a simple Python 3 crawler. Master code is taken from dmahugh's source code. Modifying it to serve a purpose.

Notifications You must be signed in to change notification settings

ehmoni/CrawlerPy3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

CrawlerPy3

This is a simple Python 3 crawler. Master code is taken from dmahugh's source code. He has a nice tutorial for writing crawlers/spiders.

http://mahugh.com/2015/12/12/crawling-the-web-with-python-3-x/

I am modifying it to serve a purpose (for SAIL). Most of the customization took part in the pagehandler. That is edited to work on the CNN web pages.

It can be made efficient (optimized) and good practices (usage of global) but for this version, this should serve the purpose.

About

This is a simple Python 3 crawler. Master code is taken from dmahugh's source code. Modifying it to serve a purpose.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages