Trucker

============= Documentation is work in progress ... Truck Transport Benefit Tracker

[www.truckers.solutions] (http://www.truckers.solutions/)

1. Introduction

Trucker is a tool for tracking track transport system all over US cities to allow carriers or brokers make informed decisions that maximize their benefit by looking at available load versus number of trucks in each city. The tool provides both real-time and historical updates of trucks/load using the following tools:

Apache Kafka
Apache HDFS
Hive
Apache Storm
Apache Cassandra
Flask with simplemap, HighCharts, Bootstrap and Ajax

2. How it Works

Trucker uses two sources of engineered data namely inbound data(for trucks/load coming to a city) and outbound data(for trucks/load going to a city) for both historical and real time data. For historical, it ingests ~200GB of data for years 2004 to 2014. Real time data is processed in real time to reflect the status directly to user while at the same time pushing the data in micro-batches to the batch layer.

[Trucker Demo] (https://www.youtube.com/watch?v=T2laLdZ9gTg&spfreload=10)

3. Architecture

![Architecture] (img/architecture.png)

#Data Generation Two sources of data (Inbound and Outbound) was generated that looks like the follwing. ![Inbound data] (img/inbound_data.png) ![Outbound data] (img/outbound_data.png)

Time stamp is randomized over period of 10 years (2004 - 2014)
Random choice of cities was made for source and destination, where the randomization was biased to the major cities (for cities with population of 6000+,[http://www.city-data.com/] (http://www.city-data.com/)). Source of all cities obtained from All_cities that has 42742 cities.
To get mile (distance) between cities, latitude and longitude of each city was used to approximate the distance using Vincenty distance method which gives fairly similar distance provided by google API. This is mainly due to limitation of google API calls that limit you to generate huge amount of data in short time.
Running multiple random data generation script easily allow you to generate huge inbound and outbound data in short time.

#Data Ingestion csv messages were produced and consumed by python scrpts using kafka-python package from https://github.com/mumrah/kafka-python.git. messages were published to two topics (inbound and outbound) with Storm streaming and HDFS acting as consumers. Messages were blockd into 20MB size micro batches and then pushing it to HDFS making it available for the batch layer when performing complete recomputation. #Batch Processing Batch processing performs count of inbound and outbound trucks for each city and state using historical data from 2004 - 2014. Aggregate views for serving queries at the user end are generated with Hive. The aggregated data was then directly written to cassandra using Python client library for Apache Cassandra. #Real-Time Processing There are two topologies (for inbound and outbound) for processing real-time data each comprising of a kafka-spout and a bolt (with tick interval frequency of 5 sec). The spout takes data from Kafka in real time and the bolt takes from spout and process it prior to writing it to Cassandra. Pyleus-storm driver from (https://github.com/Yelp/pyleus) was used on top of Apache storm to allow python implementation. #Cassandra Schema Primary key and partition key are chosen in such a way that maximizes the efficient acccess of tuple from Cassandra

Batch Tables

outbound_city_state c_city (PK), (c_state,c_year(Partition key)), c_count
outbound_city_state_major c_city (PK), (c_state,c_year(Partition key)),c_count
inbound_city_state c_city (PK), (c_state,c_year(Partition key)), c_count
inbound_city_state_major c_city (PK), (c_state,c_year(Partition key)), c_count
inbound_state c_year (PK), (c_state(Partition key)), c_count
outbound_state c_year (PK), (c_state(Partition key)), c_count

Real Time Tables

outbound_real_count c_city(PK),c_state,c__month_day_year (Partition key), c_count
inbound_real_count c_city(PK),c_state,c__month_day_year (Partition key), c_count
inbound_real_count_state c_city(PK),c_state,c__month_day_year (Partition key), c_count
outbound_real_count_state c_state(PK), c_month_day_year (Partition key), c_count

4. Startup Protocol

Kafka Server
Storm Streaming (Submit topologies to name node)
HDFS kafka consumer
Kafka message producer

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
file_generator_code		file_generator_code
hive		hive
img		img
kafka_code		kafka_code
lat_long		lat_long
pyleus_storm		pyleus_storm
web		web
.gitignore		.gitignore
README.md		README.md
derby.log		derby.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file_generator_code

file_generator_code

hive

hive

img

img

kafka_code

kafka_code

lat_long

lat_long

pyleus_storm

pyleus_storm

web

web

.gitignore

.gitignore

README.md

README.md

derby.log

derby.log

Repository files navigation

Trucker

============= Documentation is work in progress ... Truck Transport Benefit Tracker

Table of Contents:

1. Introduction

2. How it Works

3. Architecture

4. Startup Protocol

About

Releases

Packages

Languages

filmonhg/Trucker

Folders and files

Latest commit

History

Repository files navigation

Trucker

============= Documentation is work in progress ... Truck Transport Benefit Tracker

Table of Contents:

1. Introduction

2. How it Works

3. Architecture

4. Startup Protocol

About

Resources

Stars

Watchers

Forks

Languages