This is a simple Python 3 crawler. Master code is taken from dmahugh's source code. He has a nice tutorial for writing crawlers/spiders.
http://mahugh.com/2015/12/12/crawling-the-web-with-python-3-x/
I am modifying it to serve a purpose (for SAIL). Most of the customization took part in the pagehandler. That is edited to work on the CNN web pages.
It can be made efficient (optimized) and good practices (usage of global) but for this version, this should serve the purpose.