Skip to content

mknd7/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Crawler

This is a Python-based web crawler built upon Udacity's CS101 original.

Front-end is currently represented by a console-based query system. Uses BeautifulSoup to parse HTML, and is capable of reading gzip cache files. Run the driver.py program to execute a bunch of options:

  • View entire index
  • View all words encountered
  • View ranks of pages visited
  • Query the index

Note: The crawler does not respect robots.txt as of now. Any inappropriate change of seed page is not suggested.

About

Python-based web crawler

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages