- replace robot-parser with "Nikita the spider" parser (robotexclusionrule parser)
- multi-threading
- robotparser - https://docs.python.org/2/library/robotparser.html
- urlib2 - https://docs.python.org/2/library/urllib2.html
- HTMLParser - https://docs.python.org/2/library/htmlparser.html
- Google Web Search api (deprecated) - http://ajax.googleapis.com/ajax/services/search/web?q=dog&v=3.0&rsz=8&start=0
- pygoogle - http://pygoogle.googlecode.com/svn/trunk/pygoogle.py
- Threading
- TF-IDF (see example) - https://en.wikipedia.org/wiki/Tf%E2%80%93idf
- Cosine scoring
- collections.Counter - https://docs.python.org/2/library/collections.html
- hashlib - https://docs.python.org/2/library/hashlib.html#module-hashlib
- <Key, Value> where value is list
- requests - http://docs.python-requests.org/en/latest/user/quickstart/
- urlparse
- Unicode strings to regular strings - http://stackoverflow.com/questions/4855645/how-to-turn-unicode-strings-into-regular-strings
- Priority Queue - http://www.bogotobogo.com/python/python_PriorityQueue_heapq_Data_Structure.php
- static method - http://stackoverflow.com/questions/735975/static-methods-in-python
- BeautifulSoup - http://www.crummy.com/software/BeautifulSoup/