The repository is the QPNet implementation with Pytorch.
The generated samples can be found on our Demo page.
The repository includes three parts:
- Acoustic feature extraction
to extract spectral and prosodic features by WORLD - QPNet vocoder (SI: speaker-independent; SD: speaker-dependent)
to generate speech based on the input acoustic features - QPNet for sinusoid generation [ongoing]
a toy demo for generating periodic sinusoid
This repository is tested on
- Python 3.6
- Cuda 10.0
- Pytorch 1.3
- torchvision 0.4.1
The code works with both anaconda and virtualenv.
The following example uses anaconda.
$ conda create -n venvQPNet python=3.6
$ source activate venvQPNet
$ pip install sprocket-vc
$ pip install torch torchvision
$ git clone https://github.com/bigpon/QPNet.git
- corpus
the folder to put corpora
-- each corpus subfolder includes a scp subfolder for file lists and a wav subfolder for speech files - qpnet_models
the folder for trained models - qpnet_output
the folder for decoding output files - src
the folder for source code
- Dowdlod the Voice Conversion Challenge 2018 (VCC2018) corpus to run the QPNet example
$ cd QPNet/corpus/VCC2018/wav/
$ wget -o train.log -O train.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip
$ wget -o eval.log -O eval.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip
$ wget -o ref.log -O ref.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_reference.zip
$ unzip train.zip
$ unzip eval.zip
$ unzip ref.zip
- SI-QPNet training set:
corpus/VCC2018/scp/vcc18tr.scp
- SD-QPNet updating set:
corpus/VCC2018/scp/vcc18up_VCC2SPK.scp
- SD-QPNet validation set:
corpus/VCC2018/scp/vcc18va_VCC2SPK.scp
- Testing set:
corpus/VCC2018/scp/vcc18eval.scp
- Modify the corresponding CUDA and project root paths in
src/utils/param_path.py
# move to the source code folder to run the following scripts
$ cd QPNet/src/
- Output the F0 and power distributions histogram figures to
corpus/VCC2018/hist/
$ bash run_FE.sh --stage 0
-
Modify the f0_min (lower bound of F0 range), f0_max (upper bound of F0 range), and pow_th (power threshold for VAD) values of the speakers in
corpus/VCC2018/conf/pow_f0_dict.yml
*The F0 ranges setting details can be found here. -
Extract and save acoustic features of the training, evaluation, and reference sets in
corpus/VCC2018/h5/
*The analysis-synthesis speech files of the training set are also saved incorpus/VCC2018/h5_restored/
.
$ bash run_FE.sh --stage 123
- Process waveform files by noise shaping for QPNet training and save the shaped files in
corpus/VCC2018/wav_h5_ns/
$ bash run_FE.sh -stage 4
- Train and test SI-QPNet
# the gpu ID can be set by --gpu GPU_ID (default: 0)
$ bash run_QP.sh --gpu 0 --stage 03
- Update SD-QPNet for each speaker with the corresponding partial training data
$ bash run_QP.sh --gpu 0 --stage 1
- Validate SD-QPNet for each speaker with the corresponding partial training data
# the validation results are in `qpnet_models/modelname/validation_result.yml`
$ bash run_QP.sh --gpu 0 --stage 2
- Test SD-QPNet with the updating iteration number according to the validation results
# the iter number can be set by --miter NUM (default: 1000)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 4
-
The program only support WORLD acoustic features now, but you can modify the feature extraction script and change the 'feature_type' in
src/runFE.py
andsrc/runQP.py
for new features. -
You can extract acoustic feature with different settings (ex: frame length ...) and set different 'feature_format' (default: h5) in
src/runFE.py
andsrc/runQP.py
for each setting, and the program will create the corresponding folders. -
You can easily change the generation model by setting different 'network' (default: qpnet) in
src/runQP.py
when you create new generation models. -
When working with new corpus, You only need to create the file lists of wav files because the program will create feature list based on the wav file list.
-
When you create the wav file lists, please follow the form as the example
(ex: rootpath/wav/xxx/xxx.wav).
The QPNet repository is developed based on
- Pytorch WaveNet implementation by @kan-bayashi
- Voice conversion implementation by @k2kobayashi
If you find the code is helpful, please cite the following article.
@inproceedings{wu2019qpnet,
title={Quasi-Periodic WaveNet vocoder: a pitch dependent dilated convolution model for parametric speech generation},
author={Wu, Yi-Chiao and Hayashi, Tomoki and Patrick Lumban, Tobing and Kobayashi, Kazuhiro and Toda, Tomoki},
booktitle={Proceedings of Interspeech},
year={2019}
}
Development:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: yichiao.wu@g.sp.m.is.nagoya-u.ac.jp
Advisor:
Tomoki Toda @ Nagoya University
E-mail: tomoki@icts.nagoya-u.ac.jp