Skip to content

Mohitsharma44/l3embedding

 
 

Repository files navigation

l3embedding

Code for running the expriments presented in:

Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon and Juan Pablo Bello
Under review, 2018.

For the pre-trained embedding models (openL3), please go to: github.com/marl/openl3

This repository contains an implementation of the model proposed in Look, Listen and Learn (Arandjelović, R., Zisserman, A. 2017). This model uses videos to learn vision and audio features in an unsupervised fashion by training the model for the proposed Audio-Visual Correspondence (AVC) task. This task tries to determine whether a piece of audio and an image frame come from the same video and occur simulatneously.

Dependencies

  • Python 3 (we use 3.6.3)
  • ffmpeg
  • sox
  • TensorFlow (follow instructions carefully, and install before other Python dependencies)
  • keras (follow instructions carefully!)
  • Other Python dependencies can by installed via pip install -r requirements.txt

The code for the model and training implementation can be found in l3embedding/. Note that the metadata format expected is the same used in AudioSet (Gemmeke, J., Ellis, D., et al. 2017), as training this model on AudioSet was one of the goals for this implementation.

You can train an AVC/embedding model using train.py. Run python train.py -h to read the help message regarding how to use the script.

There is also a module classifier/ which contains code to train a classifier using that uses extracts embeddings on new audio using the embedding model. Currently this only supports using the UrbanSound8K dataset (Salamon, J., Jacoby, C., Bello, J. 2014)

You can train an urban sound classification model using train_classifier.py. Run python train_classifier.py -h to read the help message regarding how to use the script.

Download VGGish models:

  • cd ./resources/vggish
  • curl -O https://storage.googleapis.com/audioset/vggish_model.ckpt
  • curl -O https://storage.googleapis.com/audioset/vggish_pca_params.npz
  • cd ../..

If you use a SLURM environment, sbatch scripts are available in jobs/.

About

Learn and L3 embedding from audio/video pairs

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 44.2%
  • Python 39.0%
  • Shell 16.8%