l3embedding

Code for running the expriments presented in:

Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon and Juan Pablo Bello
Under review, 2018.

For the pre-trained embedding models (openL3), please go to: github.com/marl/openl3

This repository contains an implementation of the model proposed in Look, Listen and Learn (Arandjelović, R., Zisserman, A. 2017). This model uses videos to learn vision and audio features in an unsupervised fashion by training the model for the proposed Audio-Visual Correspondence (AVC) task. This task tries to determine whether a piece of audio and an image frame come from the same video and occur simulatneously.

Dependencies

Python 3 (we use 3.6.3)
ffmpeg
sox
TensorFlow (follow instructions carefully, and install before other Python dependencies)
keras (follow instructions carefully!)
Other Python dependencies can by installed via pip install -r requirements.txt

The code for the model and training implementation can be found in l3embedding/. Note that the metadata format expected is the same used in AudioSet (Gemmeke, J., Ellis, D., et al. 2017), as training this model on AudioSet was one of the goals for this implementation.

You can train an AVC/embedding model using train.py. Run python train.py -h to read the help message regarding how to use the script.

There is also a module classifier/ which contains code to train a classifier using that uses extracts embeddings on new audio using the embedding model. Currently this only supports using the UrbanSound8K dataset (Salamon, J., Jacoby, C., Bello, J. 2014)

You can train an urban sound classification model using train_classifier.py. Run python train_classifier.py -h to read the help message regarding how to use the script.

Download VGGish models:

cd ./resources/vggish
curl -O https://storage.googleapis.com/audioset/vggish_model.ckpt
curl -O https://storage.googleapis.com/audioset/vggish_pca_params.npz
cd ../..

If you use a SLURM environment, sbatch scripts are available in jobs/.

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
audioset		audioset
classifier		classifier
data		data
jobs		jobs
l3embedding		l3embedding
notebooks		notebooks
resources		resources
.gitignore		.gitignore
01_create_subsets.py		01_create_subsets.py
02_generate_samples.py		02_generate_samples.py
03_train_embedding.py		03_train_embedding.py
04_plot_training_history.py		04_plot_training_history.py
05_generate_embedding_samples.py		05_generate_embedding_samples.py
06_train_classifier.py		06_train_classifier.py
README.md		README.md
audioset_filter.csv		audioset_filter.csv
generate_plots_and_sig_tests.py		generate_plots_and_sig_tests.py
gsheets.py		gsheets.py
l3conda.yml		l3conda.yml
log.py		log.py
recompute-batch-audio.sbatch		recompute-batch-audio.sbatch
recompute_batch_audio.py		recompute_batch_audio.py
requirements.txt		requirements.txt
requirements_cpu.txt		requirements_cpu.txt

Mohitsharma44/l3embedding

Folders and files

Latest commit

History

Repository files navigation

l3embedding

Download VGGish models:

About

Resources

Stars

Watchers

Forks

Languages