Skip to content

entn-at/QPNet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Version

Quasi-Periodic WaveNet (QPNet)

Introduction

The repository is the QPNet implementation with Pytorch.

The generated samples can be found on our Demo page.

The repository includes three parts:

  1. Acoustic feature extraction
    to extract spectral and prosodic features by WORLD
  2. QPNet vocoder (SI: speaker-independent; SD: speaker-dependent)
    to generate speech based on the input acoustic features
  3. QPNet for sinusoid generation [ongoing]
    a toy demo for generating periodic sinusoid

Requirements

This repository is tested on

  • Python 3.6
  • Cuda 10.0
  • Pytorch 1.3
  • torchvision 0.4.1

Setup

The code works with both anaconda and virtualenv.
The following example uses anaconda.

$ conda create -n venvQPNet python=3.6
$ source activate venvQPNet
$ pip install sprocket-vc
$ pip install torch torchvision
$ git clone https://github.com/bigpon/QPNet.git

Folder architecture

  • corpus
    the folder to put corpora
    -- each corpus subfolder includes a scp subfolder for file lists and a wav subfolder for speech files
  • qpnet_models
    the folder for trained models
  • qpnet_output
    the folder for decoding output files
  • src
    the folder for source code

Example

Corpus download:

$ cd QPNet/corpus/VCC2018/wav/

$ wget -o train.log -O train.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip

$ wget -o eval.log -O eval.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip

$ wget -o ref.log -O ref.zip https://datashare.is.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_reference.zip

$ unzip train.zip
$ unzip eval.zip
$ unzip ref.zip
  • SI-QPNet training set: corpus/VCC2018/scp/vcc18tr.scp
  • SD-QPNet updating set: corpus/VCC2018/scp/vcc18up_VCC2SPK.scp
  • SD-QPNet validation set: corpus/VCC2018/scp/vcc18va_VCC2SPK.scp
  • Testing set: corpus/VCC2018/scp/vcc18eval.scp

Path setup:

  • Modify the corresponding CUDA and project root paths in src/utils/param_path.py
# move to the source code folder to run the following scripts
$ cd QPNet/src/

Feature extraction:

  1. Output the F0 and power distributions histogram figures to corpus/VCC2018/hist/
$ bash run_FE.sh --stage 0
  1. Modify the f0_min (lower bound of F0 range), f0_max (upper bound of F0 range), and pow_th (power threshold for VAD) values of the speakers in corpus/VCC2018/conf/pow_f0_dict.yml
    *The F0 ranges setting details can be found here.

  2. Extract and save acoustic features of the training, evaluation, and reference sets in corpus/VCC2018/h5/
    *The analysis-synthesis speech files of the training set are also saved in corpus/VCC2018/h5_restored/.

$ bash run_FE.sh --stage 123
  1. Process waveform files by noise shaping for QPNet training and save the shaped files in corpus/VCC2018/wav_h5_ns/
$ bash run_FE.sh -stage 4 

QPNet vocoder:

  1. Train and test SI-QPNet
# the gpu ID can be set by --gpu GPU_ID (default: 0)
$ bash run_QP.sh --gpu 0 --stage 03
  1. Update SD-QPNet for each speaker with the corresponding partial training data
$ bash run_QP.sh --gpu 0 --stage 1
  1. Validate SD-QPNet for each speaker with the corresponding partial training data
# the validation results are in `qpnet_models/modelname/validation_result.yml`
$ bash run_QP.sh --gpu 0 --stage 2
  1. Test SD-QPNet with the updating iteration number according to the validation results
# the iter number can be set by --miter NUM (default: 1000)
$ bash run_QP.sh --gpu 0 --miter 1000 --stage 4

Hints

  • The program only support WORLD acoustic features now, but you can modify the feature extraction script and change the 'feature_type' in src/runFE.py and src/runQP.py for new features.

  • You can extract acoustic feature with different settings (ex: frame length ...) and set different 'feature_format' (default: h5) in src/runFE.py and src/runQP.py for each setting, and the program will create the corresponding folders.

  • You can easily change the generation model by setting different 'network' (default: qpnet) in src/runQP.py when you create new generation models.

  • When working with new corpus, You only need to create the file lists of wav files because the program will create feature list based on the wav file list.

  • When you create the wav file lists, please follow the form as the example
    (ex: rootpath/wav/xxx/xxx.wav).

References

The QPNet repository is developed based on

Citation

If you find the code is helpful, please cite the following article.

@inproceedings{wu2019qpnet,
  title={Quasi-Periodic WaveNet vocoder: a pitch dependent dilated convolution model for parametric speech generation},
  author={Wu, Yi-Chiao and Hayashi, Tomoki and Patrick Lumban, Tobing and Kobayashi, Kazuhiro and Toda, Tomoki},
  booktitle={Proceedings of Interspeech},
  year={2019}
}

Authors

Development:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: yichiao.wu@g.sp.m.is.nagoya-u.ac.jp

Advisor:
Tomoki Toda @ Nagoya University
E-mail: tomoki@icts.nagoya-u.ac.jp

About

Quasi-Periodic WaveNet Pytorch implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.2%
  • Shell 3.8%