Skip to content

moyang28/movieHarmony

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

moviEharmony

www.moviEharmony.com

Slideshow

video

moviEharmony.com is a data platform which can finds movies which 2 people may like to watch together. It is completely open-source and uses the following technologies:

  • Apache Kafka
  • Python
  • Amazon S3
  • Spark / Spark MLlib
  • Apache Cassandra
  • Flask

The moviEharmony Website

moviEharmony.com is currently batch processing (as of Oct 7, 2015) Amazon review dataset. These reviews provide the data which drive the following components of moviEharmony.com:

  • MovieSearch: Allows 2 users to find movies that they may like to watch together.

  • QueryUser: Allows users to find what they have reviewed in the past

  • EnterReview: Allows users to add movie reviews

moviEharmony InANutshell

This is my pipeline, the first step of this pipeline is to ingest user’s input movie review. A webpage is created so user can submit their movie reviews from their web browser. These reviews will be transformed to a json message and be sent to Kafka. A batch consumer job to save these messages from Kafka to S3. And combining these new reviews with all the historical reviews from amazon dataset, I can train a collaborative filtering model with my spark cluster. Spark machine learning library currently use a model based alternating least squares algorithm to learn latent factors and then use these latent factors to predict missing movie ratings for the users. The model will be saved to S3 and estimated ratings will be saved to cassandra. At the end, flask will be querying cassandra to get movie recommendations return to the users.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 47.1%
  • HTML 23.0%
  • Jupyter Notebook 20.8%
  • CSS 7.1%
  • Python 1.8%
  • Shell 0.2%