PYTORCH-WAVENET-VOCODER

This repository is the wavenet-vocoder implementation with pytorch.

Key features

Support kaldi-like recipe, easy to reproduce the results
Support multi-gpu training / decoding
Support world features / mel-spectrogram as auxiliary features
Support recipes of three public databases
- CMU Arctic database: egs/arctic
- LJ Speech database: egs/ljspeech
- M-AILABS speech database: egs/m-ailabs-speech

Requirements

python 3.6
virtualenv
cuda 8.0
cndnn 6
nccl 2.0+ (for the use of multi-gpus)

Recommend to use the GPU with 10GB> memory.

Setup

$ git clone https://github.com/kan-bayashi/PytorchWaveNetVocoder.git
$ cd PytorchWaveNetVocoder/tools
$ make

How-to-run

$ cd egs/arctic/sd
$ ./run.sh

See more detail of the recipes in egs/README.md.

Results

This is the subjective evaluation results using arctic recipe.

You can listen the samples generated by our models from here.

arctic_raw_16k.wav: original in arctic database
arctic_sd_16k_world.wav: sd model with world aux feats + noise shaping with world mcep
arctic_si-open_16k_world.wav: si-open model with world aux feats + noise shaping with world mcep
arctic_si-close_16k_world.wav: si-close model with world aux feats + noise shaping with world mcep
arctic_si-close_16k_melspc.wav: si-close model with mel-spectrogram aux feats
arctic_si-close_16k_melspc_ns.wav: si-close model with mel-spectrogram aux feats + noise shaping with stft mcep
ljspeech_raw_22.05k.wav: original in ljspeech database
ljspeech_sd_22.05k_world.wav: sd model with world aux feats + noise shaping with world mcep
ljspeech_sd_22.05k_melspc.wav: sd model with mel-spectrogram aux feats
ljspeech_sd_22.05k_melspc_ns.wav: sd model with mel-spectrogram aux feats + noise shaping with stft mcep
m-ailabs_raw_16k.wav: original in m-ailabs speech database
m-ailabs_sd_16k_melspc.wav: sd model with mel-spectrogram aux feats

References

Please cite the following articles.

@inproceedings{tamamori2017speaker,
  title={Speaker-dependent WaveNet vocoder},
  author={Tamamori, Akira and Hayashi, Tomoki and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proceedings of Interspeech},
  pages={1118--1122},
  year={2017}
}
@inproceedings{hayashi2017multi,
  title={An Investigation of Multi-Speaker Training for WaveNet Vocoder},
  author={Hayashi, Tomoki and Tamamori, Akira and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proc. ASRU 2017},
  year={2017}
}
@article{hayashi2018sp,
  title={複数話者WaveNetボコーダに関する調査}.
  author={林知樹 and 小林和弘 and 玉森聡 and 武田一哉 and 戸田智基},
  journal={電子情報通信学会技術研究報告},
  year={2018}
}

Author

Tomoki Hayashi @ Nagoya University
e-mail:hayashi.tomoki@g.sp.m.is.nagoya-u.ac.jp

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
egs		egs
src		src
test		test
tools		tools
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

egs

egs

src

src

test

test

tools

tools

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

setup.cfg

setup.cfg

Repository files navigation

PYTORCH-WAVENET-VOCODER

Key features

Requirements

Setup

How-to-run

Results

References

Author

About

Releases

Packages

Languages

License

wyn314/PytorchWaveNetVocoder

Folders and files

Latest commit

History

Repository files navigation

PYTORCH-WAVENET-VOCODER

Key features

Requirements

Setup

How-to-run

Results

References

Author

About

Resources

License

Stars

Watchers

Forks

Languages