song.ly - connected by songs

song.ly (song + friendly) is a song recommendation application built during my time at Insight Data Engineering program.

Motivation

many talented local artists get lesser visibility and reach in music streaming applications - increase their reach
build a community of people with similar musical tastes and let them explore music together - connect
personalized recommendations often tie users down to their history and fail to provide reasons for why something is recommened to the user - provide transparency

song.ly presents an approach to address the above concerns.

Introduction

song.ly is a song recommendation application with the following features:

Suggest songs to a user based on the songs listened to by the most relevant friends of the user
Suggest artists to listen to based on the current location of the user
Suggest songs frequently played together with the current song (users who listened to this also listened to)
Suggest friends based on a relevance score which mimics a naive, logical implementation of collaborative filtering defined as:

Datasets

I used the "Million Song Dataset" [1] which is "a freely-available collection of audio features and metadata for a million contemporary popular music tracks" according to Labrosa website. Along with the metadata for songs a list of more than 150 M user-song request pairs was obtained from Echonest [2] and Last.fm. Also a list of unique artists with their location information was obtained from Echonest. More details can be found here.

Data Pipeline

####Ingestion Layer Kafka: The user taste profile is used to synthesize more user-song requests as a stream of request data. A synthesized stream of user's current location and the user-song requests are ingested into Kafka.

####Streaming Layer Spark Streaming: The ingested data gets processed by Spark streaming to extract data in the required formats. The information of user-song request with timestamp is stored into Cassandra - a NoSQL data store. The counts for requested songs and the users' current locations are stored in Redis - a caching datastore - for faster access. The data is periodically flushed into HDFS.

####Batch Layer Spark: Apache Spark reads data from HDFS to find friend suggestions, update relevance scores and mine frequent pattern among songs. The recommendations are explained here.

Cassandra Tables

user_song_log: (streaming) stores user-song requests partitioned by time
user_to_song: (streaming) stores user-song requests partitioned by user
song_to_user: (streaming) stores user-song requests partitioned by song
user_connections: stores user's connections (follows) partitioned by user
user_relevance: (batch) stores suggested users with relevance score
frequent_song_pairs: (batch) stores song-song frequencies

Demo

The application can be accessed at song.ly To login username: adam, password: 123

The app may not work as intended after Feb 28. The AWS machines will be terminated after that. Please look at the video for a demo.
Video demo: https://youtu.be/kdWi8uVOJh8

References

[1] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.

[2] The Echo Nest Taste profile subset, the official user data collection for the Million Song Dataset, available at: http://labrosa.ee.columbia.edu/millionsong/tasteprofile

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
cassandra_scripts		cassandra_scripts
extract_data_scripts		extract_data_scripts
main		main
spark_scripts		spark_scripts
utilities		utilities
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.py		app.py
data-detail.txt		data-detail.txt
requirements.txt		requirements.txt
tasks.TODO		tasks.TODO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cassandra_scripts

cassandra_scripts

extract_data_scripts

extract_data_scripts

main

main

spark_scripts

spark_scripts

utilities

utilities

.gitignore

.gitignore

Procfile

Procfile

README.md

README.md

app.py

app.py

data-detail.txt

data-detail.txt

requirements.txt

requirements.txt

tasks.TODO

tasks.TODO

Repository files navigation

song.ly - connected by songs

Motivation

Introduction

Datasets

Data Pipeline

Cassandra Tables

Demo

References

About

Releases

Packages

Languages

dragon-fury/song.ly

Folders and files

Latest commit

History

Repository files navigation

song.ly - connected by songs

Motivation

Introduction

Datasets

Data Pipeline

Cassandra Tables

Demo

References

About

Resources

Stars

Watchers

Forks

Languages