Music Style Transfer

Authors: Henrik Høiness (henrhoi), Axel Harstad (axeloh) and Marius Landsverk Cervera (mariusmcl)

Deep learning models trained on visual data have changed the field of computer vision and is now finding more and more ways into consumer products and business applications.

Humans however are heavily dependent on audio data to navigate their lives, and being able to create powerful models for raw audio data has numerous applications ranging from the arts to new business opportunities. In this work we expand upon recent architectural advances (Aaron van den Oord et. al.: WaveNet: A Generative Model for Raw Audio (2016) ) and propose a WaveNet-like autoencoder with a shared encoder and multiple decoders to perform style transfer between multiple musical instruments. In addition, we explore this models' potential of producing automatic composition of music, by reducing the latent space used in the network.

Our method consists in training multiple decoders, one for each domain, together with a shared encoder. In order to enforce a disentangled latent representation, a domain classifier is trained to classify the latent representations' domain. Also, the audio input is augmented to further enforce the encoder to encode in a high-level, semantic way.

The architecture is heavily inspired by Noem Mor et al.: A Universal Music Translation Network (2018) (repo).

All samples in this repo are from a WaveNet Autoencoder trained 4 days on two Tesla V100 with two decoders for the two domains Bach Solo Cello and Beethoven Solo Piano.

You can listen to the samples and see the architecture used here.

Dataset

Dataset used in experiments is Musicnet, which can be found here

To preprocess and extract domains from the dataset, run the scripts below in the following order:

$ python preprocessing/seperate_domains.py
$ python preprocessing/train_test_val_split.py
$ python preprocessing/preprocess.py

Training

Pre-trained models can be downloaded here and place it under checkpoints/trained_models

To train on a single GPU-node run:

$ python train_music_translation.py

Generate

In order to generate samples from dataset/samples/input, run:

$ python generate_style_and_instrument_transfer.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset/samples		dataset/samples
ffmpeg		ffmpeg
models		models
preprocessing		preprocessing
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset_factory.py		dataset_factory.py
generate_style_and_instrument_transfer.py		generate_style_and_instrument_transfer.py
setup.py		setup.py
train_music_translation.py		train_music_translation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset/samples

dataset/samples

ffmpeg

ffmpeg

models

models

preprocessing

preprocessing

utils

utils

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

config.py

config.py

dataset_factory.py

dataset_factory.py

generate_style_and_instrument_transfer.py

generate_style_and_instrument_transfer.py

setup.py

setup.py

train_music_translation.py

train_music_translation.py

Repository files navigation

Music Style Transfer

Dataset

Training

Generate

About

Releases

Packages

Contributors 3

Languages

mariusmcl/music-style-transfer

Folders and files

Latest commit

History

Repository files navigation

Music Style Transfer

Dataset

Training

Generate

About

Resources

Stars

Watchers

Forks

Languages