Skip to content

jvbalen/cover_id

Repository files navigation

Differentiable Fingerprinting: a convolutional architecture for cover song detection

Work in progress on 'learning to fingerprint' for challening audio-based content ID problems, such as cover song detection.

Currently focused on experiments in which a fingerprint is learned from a dataset of cover songs. The main idea behind this is explained in our Audio Bigrams paper [1].

See this notebook.

Very briefly explained:

  1. most fingerprints encode some kind of co-occurrence of salient events
    (e.g., Shazam's landmark-based fingerprinter, 'intervalgrams'...)

  2. 'salient event detection' can be implemented as a convolution:
    conv2d(X, W)
    with W the 'salient events'.

  3. co-occurrence can be implemented as
    conv2d(X, w) @ X.T
    with w a window and @ the matrix product.

  4. all of this is differentiable, therefore, any fingerprinting system that can be formulated like this can be trained 'end-to-end'.

To evaluate the learned fingerprint, we compare to the elegant and performant '2D Fourier Transform Magniture Coeffients' by Bertin-Mahieux and Ellis [2], and a simpler fingerprinting approach by Kim et al [3].

We use the Second-hand Song Dataset with dublicates removed as proposed by Julien Osmalskyj.

[1] Van Balen, J., Wiering, F., & Veltkamp, R. (2015). Audio Bigrams as a Unifying Model of Pitch-based Song Description.

[2] Bertin-Mahieux, T., & Ellis, D. P. W. (2012). Large-Scale Cover Song Recognition Using The 2d Fourier Transform Magnitude. In Proc. International Society for Music Information Retrieval Conference.

[3] Kim, S., Unal, E., & Narayanan, S. (2008). Music fingerprint extraction for classical music cover song identification. IEEE Conference on Multimedia and Expo.


(c) 2016 Jan Van Balen

github.com/jvbalen - twitter.com/jvanbalen

About

Chroma fingerprinting for cover detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published