Skip to content

A locality sensitive hashing based song snippet matching algorithm

Notifications You must be signed in to change notification settings

HanYining/Shazam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A Brief Documentation about the Shazam project.

Data Structure Introduction

Basically A song is just a numpy array with shape (2, time * sampling frequency) Here for simplicity the left and right stereo of the song is averaged to get a one dimensional numpy array representing the song’s signal structure over time.

Windowing

In order to match short snippet as the needs of Shazam, the full track of song is partitioned into a bunch of overlapping windows, the window length in this system is defaulted at 10 seconds, and the shift step of the window is defaulted at 1 second. The weights being put on each signal points in every window is at default generated by the hamming window. All those default can be easily changed through the shazam file.

Fast Fourier Transformation

After acquiring windows data from the song, a FFT is performed on every windowed data to get the frequency-amplitude structure of every window. This is done by the signal.spectrogram package in scipy.

Peak Picker (AKA fingerprinter)

Since it is extremely inefficient to store the whole frequency structure of every window A dimension reduction to extract robust identifier for each window is done by the peakpicker function in the songanalyzer module. Basically it selects some frequencies where the local periodgram has a peak. The range of the local periodgram is defined by the freq_portion which is defaulted at 0.2. It is also easily changeable from the user interface shazam file.

Database Schema

In this stage, each song has been reduced to a (window, signature) matrix, and the size of it has been significantly reduced which made it possible to store it in a local postgresql database. The database schema as follows.

Table Song Info
Song_id (PK)Song_name (Unique)
1Hello
2Test
Table AKAFINGER
Signature_ID (PK)TIME_ID (INTEGER)SONG_ID (FOREIGN KEY)SIGNATURE FLOAT VECTOR
151[1,2,3,0,2,3,….,1]
261[2,3,4,1,2,6,….,3]

Since the falconn package gives the K nearest neighbors’ corresponding row number of the query matrix, it is beneficial to keep the Serial Signature_ID as the primary key to facilitate the fast look up.

The Query Process (LSH)

After the set up of the signature data base, the dimension of the windowed data is being reduced. But Still under the default parameters, this is still a 25 dimensional inputs. Thus a LSH is used by the falconn package in the query module. The setup_lsh function retrieve the signature matrix of all the existing songs in our database and set up a falconn query object. In order to optimize the hashing process, the matrix is centralized, and the center is recorded to do the same centering on the querying snippet.

Matching (Confidence Level Calculated)

At this stage, the querying snippet is matched through the falconn, say that the querying snippet returns N windowed data. In order to increase the robustness of the matching process, for every one window out of the N, K approximate nearest neighbors are found by the falconn and returned. Those matchings are not perfect, which is the true matches might be ranked as the 2nd of 3rd in the K nearest neighbors for that window. So in order to utilizing this information while punishing the amount of mismatch, a fading parameter is added to account for the decreasing confidence for matches at a low rank. Match with confidence level lower than 70% is not reported.

User Interface with Argparse package

In the shazam file, a user interface is constructed by the argparse package, it supports three kind of subparsers to deal with different operations.

  1. shazam digest DIRECTORY: digest entire directory of songs, using regular expression to extract filename as song names
  2. shazam insert --title "song name" --artist "artist name" filename, insert a song into the local database duplicated songs are not allowed and a notice will throw.
  3. shazam identify file, identify a wav format song from the repository.

Accuracy Estimates

I tested the songs inside the database to see if it provide an accurate matching, turns out that for those songs inside the database, all of them shows the correct match with confidence level around 98-100%. Also I tried 5 songs outside of the database, resulted in all no match outputs.

Test Cases and Test Driver.

Complete test cases is provided in the test/ sub folder mainly test for

  1. the calculation of matching scores
  2. The matching song name
  3. The peak picker procedure
  4. Test weather the process of read in a single song works fine.

Since the python function files and test files are all organized into modules, (__init__.py) files included All the test files can be run by typing python -m unittest discover

Possible improvements and further directions.

Optimally the matching process should also consider the time sequence structure. Say the snippet can be divided into 5 windows, than optimally the matched windows should also be in subsequent order in that corresponding song. The current representation does not provide this kind of support.

About

A locality sensitive hashing based song snippet matching algorithm

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages