GitHub - dnephin/Threaded-Crawler

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
conf		conf
doc		doc
external_lib		external_lib
lib/crawler		lib/crawler
test		test
.gitignore		.gitignore
BUGS		BUGS
MANIFEST.in		MANIFEST.in
README		README
TODO		TODO
cmd		cmd
setup.py		setup.py
tcrawler		tcrawler

Repository files navigation

Threaded Crawler

This web crawler is designed to be a generic and highly configurable crawler, that 
can quickly traverse sites, and pull content based on regex and other selection criteria.

__Requirements__

Uses BeatifulSoup to parse html pages (http://www.crummy.com/software/BeautifulSoup/)
Uses epydoc for documentation
Uses JobSite common package

python-psycopg2 2.0.8

__Development__

The 'cmd' script can be used to clean and build docs.
Documentation is in doc/API.


__INSTALL__

python setup.py install


__Running__

$COMMON environment variable should be set to the path for common/patterns.py 
lib, or the lib should be installed on the default python path.

About

No description, website, or topics provided.

Readme

Activity

1 star

3 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conf

conf

doc

doc

external_lib

external_lib

lib/crawler

lib/crawler

test

test

.gitignore

.gitignore

BUGS

BUGS

MANIFEST.in

MANIFEST.in

README

README

TODO

TODO

cmd

cmd

setup.py

setup.py

tcrawler

tcrawler

Repository files navigation

About

Releases

Packages

Languages

dnephin/Threaded-Crawler

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages