Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

PyCrawler

A web crawler written in python (requires python 2.7)
Collects HTML and PDF files only
Always obeys robots.txt policies

How to Run

To run, simply run $ python PyCrawler.py
Results are placed in the current directory in the file results.txt

Command Line Arguments

-s, --source The URL to start at

-l, --limit Limit the number of URL's crawled

-w, --wait Time, in seconds, to wait between requests

-d, --domain Limit URL's crawled to domain

About

A web crawler written in Python

Report repository

Releases 1

Packages

No packages published

Languages

Python 100.0%