Frontera

Overview

Frontera is a framework implementation of a crawl frontier. Designed with Scrapy in mind, but applicable to any web crawling project.

Frontera takes care of the logic and policies to follow during the crawl. It stores and prioritises links extracted by the crawler to decide which pages to visit next.

Installation

$ pip install frontera

Documentation

See http://frontera.readthedocs.org/

EuroPython's presentation http://www.slideshare.net/sixtyone/fronteraopen-source-large-scale-web-crawling-framework

Google groups

See https://groups.google.com/a/scrapinghub.com/forum/#!forum/frontera

Distributed extension

GitHub: https://github.com/scrapinghub/distributed-frontera

RTD: http://distributed-frontera.readthedocs.org/

Name		Name	Last commit message	Last commit date
Latest commit History 220 Commits
docs		docs
examples		examples
frontera		frontera
requirements		requirements
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
versioneer.py		versioneer.py

License

chris-zen/frontera

Folders and files

Latest commit

History

Repository files navigation

Frontera

Overview

Installation

Documentation

Google groups

Distributed extension

About

Resources

License

Stars

Watchers

Forks

Languages