Quasi-Periodic WaveNet (QPNet)

Introduction

The repository is the QPNet implementation with Pytorch.

The generated samples can be found on our Demo page.

The repository includes three parts:

Acoustic feature extraction
to extract spectral and prosodic features by WORLD
QPNet vocoder (SI: speaker-independent; SD: speaker-dependent)
to generate speech based on the input acoustic features
QPNet for sinusoid generation [ongoing]
a toy demo for generating periodic sinusoid

Requirements

This repository is tested on

Python 3.6
Cuda 10.0
Pytorch 1.3
torchvision 0.4.1

Setup

The code works with both anaconda and virtualenv.
The following example uses anaconda.

$ conda create -n venvQPNet python=3.6
$ source activate venvQPNet
$ pip install sprocket-vc
$ pip install torch torchvision
$ git clone https://github.com/bigpon/QPNet.git

Folder architecture

corpus
the folder to put corpora
-- each corpus subfolder includes a scp subfolder for file lists and a wav subfolder for speech files
qpnet_models
the folder for trained models
qpnet_output
the folder for decoding output files
src
the folder for source code

Example

Corpus download:

Dowdlod the Voice Conversion Challenge 2018 (VCC2018) corpus to run the QPNet example

$ cd QPNet/corpus/VCC2018/wav/

$ wget -o train.log -O train.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip

$ wget -o eval.log -O eval.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip

$ wget -o ref.log -O ref.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_reference.zip

$ unzip train.zip
$ unzip eval.zip
$ unzip ref.zip

SI-QPNet training set: corpus/VCC2018/scp/vcc18tr.scp
SD-QPNet updating set: corpus/VCC2018/scp/vcc18up_VCC2SPK.scp
SD-QPNet validation set: corpus/VCC2018/scp/vcc18va_VCC2SPK.scp
Testing set: corpus/VCC2018/scp/vcc18eval.scp

Path setup:

Modify the corresponding CUDA and project root paths in src/utils/param_path.py

# move to the source code folder to run the following scripts
$ cd QPNet/src/

Feature extraction:

Output the F0 and power distributions histogram figures to corpus/VCC2018/hist/

$ bash run_FE.sh --stage 0

Modify the f0_min (lower bound of F0 range), f0_max (upper bound of F0 range), and pow_th (power threshold for VAD) values of the speakers in corpus/VCC2018/conf/pow_f0_dict.yml
*The F0 ranges setting details can be found here.
Extract and save acoustic features of the training, evaluation, and reference sets in corpus/VCC2018/h5/
*The analysis-synthesis speech files of the training set are also saved in corpus/VCC2018/h5_restored/.

$ bash run_FE.sh --stage 123

Process waveform files by noise shaping for QPNet training and save the shaped files in corpus/VCC2018/wav_h5_ns/

$ bash run_FE.sh -stage 4

QPNet vocoder:

Train and test SI-QPNet

# the gpu ID can be set by --gpu GPU_ID (default: 0)
$ bash run_QP.sh --gpu 0 --stage 03

Update SD-QPNet for each speaker with the corresponding partial training data

$ bash run_QP.sh --gpu 0 --stage 1

Validate SD-QPNet for each speaker with the corresponding partial training data

# the validation results are in `qpnet_models/modelname/validation_result.yml`
$ bash run_QP.sh --gpu 0 --stage 2

Test SD-QPNet with the updating iteration number according to the validation results

# the iter number can be set by --miter NUM (default: 1000)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 4

Hints

The program only support WORLD acoustic features now, but you can modify the feature extraction script and change the 'feature_type' in src/runFE.py and src/runQP.py for new features.
You can extract acoustic feature with different settings (ex: frame length ...) and set different 'feature_format' (default: h5) in src/runFE.py and src/runQP.py for each setting, and the program will create the corresponding folders.
You can easily change the generation model by setting different 'network' (default: qpnet) in src/runQP.py when you create new generation models.
When working with new corpus, You only need to create the file lists of wav files because the program will create feature list based on the wav file list.
When you create the wav file lists, please follow the form as the example
(ex: rootpath/wav/xxx/xxx.wav).

References

The QPNet repository is developed based on

Pytorch WaveNet implementation by @kan-bayashi
Voice conversion implementation by @k2kobayashi

Citation

If you find the code is helpful, please cite the following article.

@inproceedings{wu2019qpnet,
  title={Quasi-Periodic WaveNet vocoder: a pitch dependent dilated convolution model for parametric speech generation},
  author={Wu, Yi-Chiao and Hayashi, Tomoki and Patrick Lumban, Tobing and Kobayashi, Kazuhiro and Toda, Tomoki},
  booktitle={Proceedings of Interspeech},
  year={2019}
}

Authors

Development:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: yichiao.wu@g.sp.m.is.nagoya-u.ac.jp

Advisor:
Tomoki Toda @ Nagoya University
E-mail: tomoki@icts.nagoya-u.ac.jp

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
corpus/VCC2018		corpus/VCC2018
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus/VCC2018

corpus/VCC2018

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Quasi-Periodic WaveNet (QPNet)

Introduction

Requirements

Setup

Folder architecture

Example

Corpus download:

Path setup:

Feature extraction:

QPNet vocoder:

Hints

References

Citation

Authors

About

Releases

Packages

Languages

License

entn-at/QPNet

Folders and files

Latest commit

History

Repository files navigation

Quasi-Periodic WaveNet (QPNet)

Introduction

Requirements

Setup

Folder architecture

Example

Corpus download:

Path setup:

Feature extraction:

QPNet vocoder:

Hints

References

Citation

Authors

About

Resources

License

Stars

Watchers

Forks

Languages