CUTLER Data Crawlers

This is the official repository of data crawlers and parsers developed for CUTLER project. In this repo you will find the crawlers and their technical documentation. Please, refer also to the User Manual for Data Crawling Software.

A fairly detailed description of data sources and crawlers is available in deliverables D3.2 and D3.3 accessible via the Deliverabels page of the project website.

Project Structure

Crawlers

The crawlers are grouped in different folders according to the type of data crawled:

Economic contains crawlers and other software related to economic data as well as instructions to run those
Environmental contains crawlers and other software related to environmental data as well as instructions to run those
Social contains crawlers and other software related to social data as well as instructions to run those

Crawlers have been implemented using different programming languages (R, python, javascript, java). Crawlers are used to inject data either to a Hadoop Distributed File System (HDFS) or ElasticSearch. However, most of the crawlers can also be used as stand-alone. You can find more specific documentation under the different folders listed above.

Deployment in Hadoop:

General information on the deployment in Hadoop can be found in the following folder

HadoopDeployment: scripts, configuration files and instructions related to data injestion into/from Hadoop HDFS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Economic

Economic

Environmental

Environmental

HadoopDeployment

HadoopDeployment

Social

Social

README.md

README.md

Repository files navigation

CUTLER Data Crawlers

Project Structure

Crawlers

Deployment in Hadoop:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
Economic		Economic
Environmental		Environmental
HadoopDeployment		HadoopDeployment
Social		Social
README.md		README.md

philipleroux/DataCrawlers

Folders and files

Latest commit

History

Repository files navigation

CUTLER Data Crawlers

Project Structure

Crawlers

Deployment in Hadoop:

About

Resources

Stars

Watchers

Forks

Languages