Skip to content

nhirakawa/PyCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PyCrawler

  • A web crawler written in python (requires python 2.7)
  • Collects HTML and PDF files only
  • Always obeys robots.txt policies

How to Run

  • To run, simply run $ python PyCrawler.py
  • Results are placed in the current directory in the file results.txt

Command Line Arguments

-s, --source The URL to start at

-l, --limit Limit the number of URL's crawled

-w, --wait Time, in seconds, to wait between requests

-d, --domain Limit URL's crawled to domain

About

A web crawler written in Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages