Skip to content

anirudh9119/SpeechSyn

Repository files navigation

The mel-generalized cepstrum (MGC) is an approximate representation of the spectral envelope of a speech signal on a per-frame basis. For each frame (32 ms of speech, with an overlap of 28 ms between successive frames), we estimate MGC coefficients that correspond to a filter whose frequency response approximates the log-magnitude spectrum of the speech frame.

In order to generate speech from those coefficients, we use the pitch (a variable related to the fundamental frequency of a speech frame) to generate an excitation signal, which is then filtered by the filter we found in the MGC estimation step. In unvoiced segments (i.e., segments where you have a noisy excitation because the vocal cords are not vibrating), we are currently using noise (either Gaussian or a maximum length sequence). The diagram seen below can help visualizing this:

Inline images 1

Our models are generating values related to the two inputs in this diagram (smoothed FFT amplitudes, which are converted to MGC coefficients, and pitch). The circle represents a binary decision the synthesizer makes based on the pitch value: if pitch > 0, it uses the excitation generated by the upper branch, otherwise it uses the noise from the lower branch. 

To train the models, we extract these features from real speech and then train the models to predict the next frame based on previously seen frames. Our current model is a stack of GRUs with feedforward layers between the model input and first GRU input, and the last GRU output and the model output. The criterion for training is based on the mixed density network principle: the model outputs means and standard deviations for a GMM, and to generate from this output we sample from this GMM. For the pitch, we have additionally a binomial output which represents the binary decision (pitch > 0 or == 0). (José: please let me know if there's anything wrong in this explanation!).


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published