Skip to content

philipleroux/DataCrawlers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUTLER Data Crawlers

This is the official repository of data crawlers and parsers developed for CUTLER project. In this repo you will find the crawlers and their technical documentation. Please, refer also to the User Manual for Data Crawling Software.

A fairly detailed description of data sources and crawlers is available in deliverables D3.2 and D3.3 accessible via the Deliverabels page of the project website.

Project Structure

Crawlers

The crawlers are grouped in different folders according to the type of data crawled:

  • Economic contains crawlers and other software related to economic data as well as instructions to run those
  • Environmental contains crawlers and other software related to environmental data as well as instructions to run those
  • Social contains crawlers and other software related to social data as well as instructions to run those

Crawlers have been implemented using different programming languages (R, python, javascript, java). Crawlers are used to inject data either to a Hadoop Distributed File System (HDFS) or ElasticSearch. However, most of the crawlers can also be used as stand-alone. You can find more specific documentation under the different folders listed above.

Deployment in Hadoop:

General information on the deployment in Hadoop can be found in the following folder

  • HadoopDeployment: scripts, configuration files and instructions related to data injestion into/from Hadoop HDFS

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 68.9%
  • JavaScript 24.0%
  • Java 5.9%
  • Other 1.2%