Beispiel #1
0
def plot_audio_visualizations(index=0):
    # plot audio visualizations
    vis_text, vis_raw_audio, vis_mfcc_feature, vis_spectrogram_feature, vis_audio_path = vis_train_features(
        index=index)
    plot_spectrogram_feature(vis_spectrogram_feature)
    plot_mfcc_feature(vis_mfcc_feature)
    plot_raw_audio(vis_raw_audio)
Beispiel #2
0
display(
    Markdown('**Shape of Spectrogram** : ' +
             str(vis_spectrogram_feature.shape)))

# ### Mel-Frequency Cepstral Coefficients (MFCCs)
#
# The second option for an audio feature representation is [MFCCs](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum).  You do **not** need to dig deeply into the details of how MFCCs are calculated, but if you would like more information, you are welcome to peruse the [documentation](https://github.com/jameslyons/python_speech_features) of the `python_speech_features` Python package.  Just as with the spectrogram features, the MFCCs are normalized in the supplied code.
#
# The main idea behind MFCC features is the same as spectrogram features: at each time window, the MFCC feature yields a feature vector that characterizes the sound within the window.  Note that the MFCC feature is much lower-dimensional than the spectrogram feature, which could help an acoustic model to avoid overfitting to the training dataset.

# In[4]:

from data_generator import plot_mfcc_feature

# plot normalized MFCC
plot_mfcc_feature(vis_mfcc_feature)
# print shape of MFCC
display(Markdown('**Shape of MFCC** : ' + str(vis_mfcc_feature.shape)))

# When you construct your pipeline, you will be able to choose to use either spectrogram or MFCC features.  If you would like to see different implementations that make use of MFCCs and/or spectrograms, please check out the links below:
# - This [repository](https://github.com/baidu-research/ba-dls-deepspeech) uses spectrograms.
# - This [repository](https://github.com/mozilla/DeepSpeech) uses MFCCs.
# - This [repository](https://github.com/buriburisuri/speech-to-text-wavenet) also uses MFCCs.
# - This [repository](https://github.com/pannous/tensorflow-speech-recognition/blob/master/speech_data.py) experiments with raw audio, spectrograms, and MFCCs as features.

# <a id='step2'></a>
# ## STEP 2: Deep Neural Networks for Acoustic Modeling
#
# In this section, you will experiment with various neural network architectures for acoustic modeling.
#
# You will begin by training five relatively simple architectures.  **Model 0** is provided for you.  You will write code to implement **Models 1**, **2**, **3**, and **4**.  If you would like to experiment further, you are welcome to create and train more models under the **Models 5+** heading.