Skip to content

bookendus/koreanVoiceSample

Repository files navigation

koreanVoiceSample

This project is based on https://github.com/GSByeon/multi-speaker-tacotron-tensorflow

To make Korean Voice Sample data like ljspeech (https://keithito.com/LJ-Speech-Dataset/)

Generate custom datasets

The datasets directory should look like:

datasets
├── default
│   ├── metadata.csv
│   └── wavs
│       ├── 1.wav
│       ├── 2.wav
│       ├── 3.wav
│       └── ...

Metadata.csv contains: wav-filesname|text|text-normalization

Install

python pip install -r requirements.txt

python -c "import nltk; nltk.download('punkt')"

if you use windows, ffmpeg is needed

http://adaptivesamples.com/how-to-install-ffmpeg-on-windows/

if you have problem while install hangulize, check the link below

https://github.com/sublee/hangulize

Usage

Each script execute below commands. (explain with son dataset)

  1. To automate an alignment between sounds and texts, prepare GOOGLE_APPLICATION_CREDENTIALS to use Google Speech Recognition API. To get credentials, read this.

export GOOGLE_APPLICATION_CREDENTIALS="YOUR-GOOGLE.CREDENTIALS.json"

  1. Download speech(or video) and text.

python -m dataproc.download

  1. Segment all audios on silence.

python -m audio.silence --audio_pattern "./datasets/default/wavs/*.wav" --method=pydub

  1. By using Google Speech Recognition API, we predict sentences for all segmented audios.

python -m recognition.google --audio_pattern "./datasets/default/wavs/..wav"

  1. Normailize korean text (ex, number )

python -m recognition.normalize --recognition_path "./datasets/default/recognition.json"

  1. End

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages