Skip to content
forked from rickseeger/jaunt

Data pipeline and API for accessing nearby amenities.

Notifications You must be signed in to change notification settings

jeetgangele/jaunt

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jaunt

Find places, fast!

alt text

Open Street Map mirror servers are polled in order of reliability and the 35GB compressed XML file is pulled down and pushed into HDFS.

A Hadoop Streaming job is run to parse the XML file and insert location data into HBase through the REST API.

A Flask API serves API requests, pulling records from HBase as necessary to find all nearby locations of a specific type.

alt text

The Earth is divided into tiles at 6 different resolutions. These are fixed, non-overlapping tiles ranging in size from 0.01 degrees to 0.32 degrees in width. The north-west corner of a tile must be aligned such that CornerLat % TileWidth = 0 and CornerLon % TileWidth = 0.

The mapper must therefore determine all six tiles a location belongs to and emit a record for each one.

The reducer gathers together locations belonging on each tile, keeping only the first 20, and inserts these into HBase.

Cluster config

  • 4-node cluster on AWS
  • Cloudera v5.02
  • HBase 0.96

Additional software on master node

$ sudo apt-get install python-pip expect-dev
$ sudo pip install starbase flask urllib2 jsonschema

Scripts

  • fetcher.py should be scheduled to run in cron weekly
  • Launch API on port 5000 with $ nohup python api.py >> /var/log/jaunt-api.log &
  • Launch Demo as root on port 80 with # nohup python demo.py >> /var/log/jaunt-demo.log &

About

Data pipeline and API for accessing nearby amenities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 49.3%
  • Python 36.4%
  • JavaScript 9.2%
  • CSS 2.6%
  • Shell 2.5%