Skip to content

ykim879/movie_recommendation_system

Repository files navigation

Movie Recommendation System

A Unix/Linux Environment is necessary

After clone this respository, delete two folders: apache-maven-3.6.3 and redis-5.0.12

Data Preparation and Setup

Create Conda Environment

Download:
Make sure a Python 3.6+ installation is being used

Create a new Conda environment named movie: conda create --name movie python=3.6

Activate the environment: conda activate movie

Install Redis for python: pip install redis

Install Pyspark for python: conda install pyspark

Install Flask: conda install flask

Download 25M MovieLense Dataset

Download: https://grouplens.org/datasets/movielens/25m/

Make a new folder inside the movie_recommendation_system folder named datasets and unzip the dataset inside.

Install Java 8

Download: https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html Ensure `JAVA_HOME` is set correctly in your respective PATH

Install Redis-5.0

Find 'Redis-5.0.12' and run through installation process to setup the `redis-server` and `redis-cli`. https://download.redis.io/releases/

Install Maven

Download: https://maven.apache.org/install.html

Install Spark-Redis

Clone the repository: https://github.com/RedisLabs/spark-redis/tree/branch-2.4 into the `movie_recommendation_system` folder

Enter the folder: cd spark-redis and run: mvn clean package -DskipTests

Copy the generated spark-redis-<version>-jar-with-dependencies.jar from the generated targets folder and place it inside the jars subfolder of the Conda pyspark installation

If you are having trouble finding the pyspark installation open a python shell inside the movie conda environment by calling python, running import pyspark, and running pyspark. It is likely your filepath will look as follows: <some-path>/python<version>/site-packages/pyspark/__init__.py.
Navigate to <some-path>/python<version>/site-packages/pyspark. There should be a jars folder at this location, inside which the aforementioned jar file should be copied into.

Run fillRedis.py to populate the Redis database with Movies/Ratings/Genres

Once in the Conda environment, run 'python fillRedis.py'. Make sure that before this, you have turned the redis server on. To do this, in a separate window, navigate to the 'src' folder within the redis folder and run './redis-server'.

Run app.py

Within the conda environment, run 'python app.py'. It will take a couple of minutes to process but it should output to navigate to a local webpage, which you can copy and paste into a browser.

Application and Code

Programming Language and Libraries

The program uses Python 3.6, and the libraries necessary are pyspark, redis, flask, json. You may possibly need to import os, operator, but they should be included in the standard library. For all other libraries, they are included in requirements.txt and should be installed into the conda environment with the command included on the top.

How to Run GUI

The GUI should be pretty straightforward. For the a prompt of User ID, only input integers, and when updating and adding ratings, the valid values can be stepped though or they can be directly input.

Code Documentation and References

Github Pages Referenced

https://spark.apache.org/docs/2.2.0/ml-collaborative-filtering.html (For initial model - Changed parameters and how to input data)
https://github.com/jadianes/spark-movie-lens/blob/master/engine.py (For Initial Structure of engine.py - Changed all functions but kept structure)
https://github.com/databricks/spark-training/blob/master/website/movie-recommendation-with-mllib.md (For troubleshooting - Added Comparison to Baseline model)
https://github.com/snehalnair/als-recommender-pyspark (For adding parameter tuning functionality - Unused due to OutOfMemoryError)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published