To demonstrate how to build an analytic job with Mahout on EMR, we’ll build a movie recommender. We will start with ratings given to movie titles by users in the MovieLens data set, which was compiled by the GroupLens team, and will use the “recommenditembased” example to find most-recommended movies for each user.
In the CLI type bellow commands
wget http://files.grouplens.org/datasets/movielens/ml-1m.zip
unzip ml-1m.zip
cat ml-1m/ratings.dat | sed 's/::/,/g' | cut -f1-3 -d, > ratings.csv
hadoop fs -put ratings.csv /ratings.csv
mahout recommenditembased --input /ratings.csv --output recommendations --numRecommendations 10 --outputPathForSimilarityMatrix similarity-matrix --similarityClassname SIMILARITY_COSINE
hadoop fs -ls recommendations
hadoop fs -cat recommendations/part-r-00000 | head
sudo pip3 install twisted
sudo pip3 install klein
sudo pip3 install redis
wget http://download.redis.io/releases/redis-2.8.7.tar.gz
tar xzf redis-2.8.7.tar.gz
cd redis-2.8.7
make
./src/redis-server &
Build a web service that pulls the recommendations into Redis and responds to queries. Put the above bda.py content into python file.
twistd -noy your_file_name.py &
curl localhost:8083/37