Skip to content

UoW-CPC/rabbda-earthquakes-realtime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rabbda-earthquakes-realtime

Introduction

This application comes alongs with a series of solutions that aim to demonstrate how Big Data can be used to create complex and real-life Big Data applications.

Specifically, with this application, we present how to acquire real-time data from Rest APIs and store them to Hadoop HDFS.

The data source for this demo is related to earthquakes, source: USGS science for a changing world.

USGS provides a Rest API which will be using to request earthquakes data. Sample request in csv format: earthquakes

The steps to store these data to HDFS are the following:

  1. Request the data from the Rest API.
  2. Pre-process the data to remove headers and format earthquakes date and time.
  3. Save the data temporary to the host machine.
  4. Upload the data to HDFS.

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Download the repository

The initial step is to download the repository in your Hadoop machine. To do so, run in terminal the following command:

git clone https://github.com/UoW-CPC/rabbda-earthquakes-realtime.git

Running the application

Having download the repository you can now run the application. First move to the working directory by executing the command:

cd rabbda-earthquakes-realtime

Now execute the command:

ls

There you can see a folder and three files:

  • earthquakes, folder which contains python scripts used to perform steps 1-3 mentioned in the introduction paragraph.
  • requirements.txt, file used to install packages used by python scripts.
  • flume-earthquakes-realtime.conf, file used by Flume service to perform step 4 mentioned in the introduction paragraph.
  • README.md, project description file.

Requirements installation

At this phase install the requirements by running the command:

pip install -r requirements.txt

Run the Python application

Having install the requirements you can now run the python application.

Move to the earthquakes folder:

cd earthquakes

and execute the earthquakes script:

python earthquakes.py

By default the script makes a requests every 10 minutes. As an alternative you can pass a parameter to change this value. Example:

python earthquakes.py 2

Now we have a request every 2 minutes.

To see the results open a new terminal and move to the repository directory. There, you can see a new directory, data. If you move into this folder, there is a file called earthquakes.csv.

To see its content run the following command:

cat earthquakes.csv

Alternatively, you can monitor file changes with the command:

tail -F earthquakes.csv

At this point, we have temporary stored the data in the local machine.

Run the Flume Agent

The next step is to upload those data to HDFS. To do so, we use the Flume service. Open a new terminal and move once again to the rabbda-earthquakes-realtime directory.

There we have to edit the flume-earthquakes-realtime.conf file. Specifically, you need to edit the eq.sources.r1.command and eq.sinks.k1.hdfs.path to match your local environment.

Example:

eq.sources.r1.command = tail -F /home/user/rabbda-earthquakes-realtime/data/earthquakes.csv
eq.sinks.k1.hdfs.path = hdfs://NameNode.Domain.com:8020/user/UserName/flume/realtime

Now is time to start the Flume agent and upload the data to HDFS. Execute the command:

flume-ng agent --name eq --conf-file flume-earthquakes-realtime.conf

Having done this, Flume agent starts monitoring the earthquakes.csv file for changes and uploads the data to HDFS.

Verify the data in HDFS

Finally, go to Ambari Files View in the path specified previously and see the data sinking to HDFS in real-time.

Architecture

architecture

Demo video

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages