This is the official repository of data crawlers and parsers developed for CUTLER project. In this repo you will find the crawlers and their technical documentation. Please, refer also to the User Manual for Data Crawling Software.
A fairly detailed description of data sources and crawlers is available in deliverables D3.2 and D3.3 accessible via the Deliverabels page of the project website.
The crawlers are grouped in different folders according to the type of data crawled:
- Economic contains crawlers and other software related to economic data as well as instructions to run those
- Environmental contains crawlers and other software related to environmental data as well as instructions to run those
- Social contains crawlers and other software related to social data as well as instructions to run those
Crawlers have been implemented using different programming languages (R, python, javascript, java). Crawlers are used to inject data either to a Hadoop Distributed File System (HDFS) or ElasticSearch. However, most of the crawlers can also be used as stand-alone. You can find more specific documentation under the different folders listed above.
General information on the deployment in Hadoop can be found in the following folder
- HadoopDeployment: scripts, configuration files and instructions related to data injestion into/from Hadoop HDFS