Skip to content

belovm96/chord-detection

Repository files navigation

Generic badge Build Status GitHub last commit GitHub license

ReChord

A Tool for Chord Sequence Detection

Table of Contents

Motivation

Chord Transcription is a skill of detecting chord progressions in a musical composition by ear. Acquiring this skill is time-consuming and quite daunting for most musicians, which is why the majority use sheet music to learn songs.

Even though it is easy to find accurate sheet music or tabs for the classics, that is not the case for newly released or more obscure music.

Therefore, I built ReChord - web application that transcribes chords for you in just a few minutes!

I utilized my skills in Software Development, Deep Learning, Signal Processing, and Music Theory to create the application, and hope that it will prove to be useful for fellow musicians and music enthusiasts.

Learn more: Slides Demo

Requirements

You will need Python 3.7 or higher.

Please install dependencies:

  • Clone the repo with git clone https://github.com/belovm96/chord-detection
  • From the repo's root directory run pip3 install -r requirements.txt

If you would like to use ReChord App, you will need Docker, Streamlit, and FFmpeg. Otherwise, just install FFmpeg:

Usage

Both ReChord App and ReChord Command Line Tool require GPU on your machine!

Web App

  • Clone this repository
  • From the repo's root directory cd ChordDetection/app
  • Pull this docker image - docker pull tensorflow/tensorflow:latest-gpu
  • Create docker image of the app - docker image build -t streamlit:app .
  • Run the app - docker container run --gpus all -p 8501:8501 streamlit:app

Command Line

  • Clone this repository
  • Put a song that you would like to transcribe in data folder of the repo's root directory
  • To get chord transcriptions run python transcribe.py --song ./data/song name
    • If your song name has white spaces, please enclose it with quotes, e.g. python transcribe.py --song ./data/'U2 - With Or Without You - Remastered.mp3'
    • The script will ask you to provide time interval of the song that you would like to annotate
  • Chord - Time representations will be saved to annotations folder in png format

Example

Example usage can be found in notebooks. Also, you can follow the steps below to get an idea of how ReChord can be used as a script.

  • Run python transcribe.py --song ./data/'U2 - With Or Without You - Remastered.mp3'
    Inference takes 1-3 minutes, depending on your GPU capabilities. During inference, the file will be converted to wav format and stored in the same directory as your input song.
  • Enter time interval 10:20
  • Chord - Time representation will be saved to annotations
  • Chord - Time representation can be perceived as follows:
    • y-axis - chords to play
    • x-axis - time in seconds
    • each purple square corresonds to which chord to play every 0.1-second time step. In the example below, do not play any chord from 10 to 12.5 seconds and play D chord from 12.5 to 20 seconds of the song time)

Approach

My data → model → predictions pipeline can be summarized as follows:

First stage - Preprocessing

audio      Short-time Fourier Transform algorithm is used to convert raw audio signal into the Spectrogram - Time - Frequency representation of the signal. Then, a filterbank with logarithmically spaced filters is applied to the spectrogram to scale the frequency axis values with the purpose of equalizing the frequency distance between each note in all areas of the spectrogram. Finally, logarithmic function is applied to the spectrogram values to compress the value range, and spectrogram is cut up into 0.1-second spectrogram frames in a sliding window fashion. These 0.1-second spectrogram frames are fed into the deep learning model for training/inference.

Second stage - Modeling

modeling      A Fully Convolutional Neural Network is trained on the log filtered spectrogram frames for chord prediction. However, these predictions are not used directly, since doing so ensures that the final chord sequence predictions are fragmented, which might confuse the end user of the application. Moreover, FCNN model does not exploit the fact that chords are always parts of chord progressions, i.e. sequences, losing a part of the potential predictive power. Therefore, Conditional Random Fields are introduced into the deep learning architecture to smooth out chord sequence predictions and to capture frame-to-frame dependencies between the predictions at every time step. The features extracted by the CNN are the inputs to the CRF, and the final chord sequence predictions are obtained using Viterbi Decoding.

Product Design

product

Tools

Packages & Tools used for development:

  • Docker
  • Streamlit
  • FFmpeg
  • Spotdl
  • Tensorflow
  • Keras
  • Tf2crf
  • NumPy
  • Seaborn
  • Matplotlib
  • Madmom
  • Yaml

Acknowledgements

[1] Filip Korzeniowski whose research paper was implemented and integrated into this application.
[2] Insight Artificial Intelligence Program for presenting me with an opportunity to work on this project, their guidance and support.