Skip to content

kokje/rendr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rendr

Project built during the Insight Data Engineering program

Work in progress, feedback is really appreciated

Index

  1. [Introduction] (README.md#1-introduction)
  2. [AWS Clusters] (README.md#2-aws-clusters)
  3. [Data Pipeline] (README.md#3-data-pipeline)
  4. [Front End] (README.md#4-front-end)

1. Introduction

Rendr is an application that builds a bipartite graph of users and restaurants to make recommendations using the structure of this network. A user is shown a restaurant based on how popular it is with people who are similar to the user

Data Sources

Foursquare Data collected by University of Minnesota researchers obtained from Internet Archives containing 2,153,471 users, 1,143,092 venues, 121,970 check-ins and 2,809,581 ratings that users assigned to venues; all extracted from the Foursquare application through the public API
Yelp Data obtained from the Yelp Academic Dataset Challenge consisting of 1.6M reviews and 500K tips by 366K users for 61K businesses

2. AWS Clusters

Rendr is powered by three clusters on AWS-

3. Data Pipeline

Spark and Graphx is used for all batch processing. The data from Yelp and Foursquare has very diffferent schema. Foursquare data only contains latitude and longitude of the venue and no other metadata such as whether the venue is a restaurant or not, the name, city, state etc. This needs to be filtered against yelp data which is much richer. Geohashing is used for entity resolution to determine whether a rating in foursquare refers to a restaurant in yelp.

  • Serving Layer

Cassandra is used to to save the batch results and serve the front end. Three main tables serve the application-

  • Seeds- key is the username and value is the restaurant id of the most recent restaurant liked/reviewed by the user
  • Ranks - key is the restaurant id and values are the ranks and ids of other restaurants in the network
  • IdMapper - key is the restaurant id and value is the metadata of the restaurant such as name, city, state which is needed to construct the query to the yelp API

4. Front end

Used flask for the front end along with javascript, html and css for views

About

Project built during the Insight Data Engineering program

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published