Skip to content

chris-zen/frontera

 
 

Repository files navigation

Frontera

Overview

Frontera is a framework implementation of a crawl frontier. Designed with Scrapy in mind, but applicable to any web crawling project.

Frontera takes care of the logic and policies to follow during the crawl. It stores and prioritises links extracted by the crawler to decide which pages to visit next.

Installation

$ pip install frontera

Documentation

See http://frontera.readthedocs.org/

EuroPython's presentation http://www.slideshare.net/sixtyone/fronteraopen-source-large-scale-web-crawling-framework

Google groups

See https://groups.google.com/a/scrapinghub.com/forum/#!forum/frontera

Distributed extension

GitHub: https://github.com/scrapinghub/distributed-frontera

RTD: http://distributed-frontera.readthedocs.org/

About

A flexible frontier for web crawlers

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%