This is a Python-based web crawler built upon Udacity's CS101 original.
Front-end is currently represented by a console-based query system. Uses BeautifulSoup to parse HTML, and is capable of reading gzip cache files. Run the driver.py program to execute a bunch of options:
- View entire index
- View all words encountered
- View ranks of pages visited
- Query the index
Note: The crawler does not respect robots.txt as of now. Any inappropriate change of seed page is not suggested.