Make sure a Python 3.6+ installation is being used
Create a new Conda environment named movie
: conda create --name movie python=3.6
Activate the environment: conda activate movie
Install Redis for python: pip install redis
Install Pyspark for python: conda install pyspark
Install Flask: conda install flask
Make a new folder inside the movie_recommendation_system
folder named datasets
and unzip the dataset inside.
Enter the folder: cd spark-redis
and run: mvn clean package -DskipTests
Copy the generated spark-redis-<version>-jar-with-dependencies.jar
from the generated targets
folder and place it inside the jars
subfolder of the Conda pyspark
installation
If you are having trouble finding the pyspark
installation open a python shell inside the movie
conda environment by calling python
, running import pyspark
, and running pyspark
. It is likely your filepath will look as follows: <some-path>/python<version>/site-packages/pyspark/__init__.py
.
Navigate to <some-path>/python<version>/site-packages/pyspark
. There should be a jars
folder at this location, inside which the aforementioned jar file should be copied into.
https://github.com/jadianes/spark-movie-lens/blob/master/engine.py (For Initial Structure of engine.py - Changed all functions but kept structure)
https://github.com/databricks/spark-training/blob/master/website/movie-recommendation-with-mllib.md (For troubleshooting - Added Comparison to Baseline model)
https://github.com/snehalnair/als-recommender-pyspark (For adding parameter tuning functionality - Unused due to OutOfMemoryError)