Web Crawler

This is a Python-based web crawler built upon Udacity's CS101 original.

Front-end is currently represented by a console-based query system. Uses BeautifulSoup to parse HTML, and is capable of reading gzip cache files. Run the driver.py program to execute a bunch of options:

View entire index
View all words encountered
View ranks of pages visited
Query the index

Note: The crawler does not respect robots.txt as of now. Any inappropriate change of seed page is not suggested.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
Page-ranking_factors.txt		Page-ranking_factors.txt
README.md		README.md
crawler.py		crawler.py
crawler.pyc		crawler.pyc
driver.py		driver.py
udacity_crawler.py		udacity_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

Page-ranking_factors.txt

Page-ranking_factors.txt

README.md

README.md

crawler.py

crawler.py

crawler.pyc

crawler.pyc

driver.py

driver.py

udacity_crawler.py

udacity_crawler.py

Repository files navigation

Web Crawler

About

Releases

Packages

Languages

License

mknd7/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

About

Resources

License

Stars

Watchers

Forks

Languages