# # We begin by investigating the dataset that will be used to train and evaluate your pipeline. [LibriSpeech](http://www.danielpovey.com/files/2015_icassp_librispeech.pdf) is a large corpus of English-read speech, designed for training and evaluating models for ASR. The dataset contains 1000 hours of speech derived from audiobooks. We will work with a small subset in this project, since larger-scale data would take a long while to train. However, after completing this project, if you are interested in exploring further, you are encouraged to work with more of the data that is provided [online](http://www.openslr.org/12/). # # In the code cells below, you will use the `vis_train_features` module to visualize a training example. The supplied argument `index=0` tells the module to extract the first example in the training set. (You are welcome to change `index=0` to point to a different training example, if you like, but please **DO NOT** amend any other code in the cell.) The returned variables are: # - `vis_text` - transcribed text (label) for the training example. # - `vis_raw_audio` - raw audio waveform for the training example. # - `vis_mfcc_feature` - mel-frequency cepstral coefficients (MFCCs) for the training example. # - `vis_spectrogram_feature` - spectrogram for the training example. # - `vis_audio_path` - the file path to the training example. # In[1]: from data_generator import vis_train_features # extract label and audio features for a single training example vis_text, vis_raw_audio, vis_mfcc_feature, vis_spectrogram_feature, vis_audio_path = vis_train_features( ) # The following code cell visualizes the audio waveform for your chosen example, along with the corresponding transcript. You also have the option to play the audio in the notebook! # In[23]: from IPython.display import Markdown, display from data_generator import vis_train_features, plot_raw_audio from IPython.display import Audio get_ipython().run_line_magic('matplotlib', 'inline') # plot audio signal plot_raw_audio(vis_raw_audio) # print length of audio signal display(Markdown('**Shape of Audio Signal** : ' + str(vis_raw_audio.shape))) # print transcript corresponding to audio clip
def plot_audio_visualizations(index=0): # plot audio visualizations vis_text, vis_raw_audio, vis_mfcc_feature, vis_spectrogram_feature, vis_audio_path = vis_train_features( index=index) plot_spectrogram_feature(vis_spectrogram_feature) plot_mfcc_feature(vis_mfcc_feature) plot_raw_audio(vis_raw_audio)
def load_data(): vis_text, vis_raw_audio, vis_mfcc_feature, vis_spectrogram_feature, vis_audio_path = vis_train_features( ) return vis_text, vis_raw_audio, vis_mfcc_feature, vis_spectrogram_feature, vis_audio_path