Jingju Singing Syllable Segmentation

The code in this repo aims to help reproduce the results in the work:

Jordi Pons, Rong Gong, and Xavier Serra. 2017. Score-informed Syllable Segmentation for A Cappella Singing Voice with Convolutional Neural Networks. In 18th International Society for Music Information Retrieval Conference. Suzhou, China.

This paper introduces a new score-informed method for the segmentation of jingju a cappella singing voice into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term “syllable onset”. Then, we identify which are the challenges that jingju a cappella singing poses. We propose using a score-informed Viterbi algorithm –instead of thresholding the onset function–, because the available musical knowledge we have can be used to inform the Viterbi algorithm in order to overcome the identified challenges. In addition, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture different time- frequency scales for estimating syllable onsets. The proposed method outperforms the state-of-the-art in syllable segmentation for jingju a cappella singing. We further provide an analysis of the segmentation errors which points possible research directions.

Steps to reproduce the experiment results

Clone this repository
Download Jingju a capella singing dataset, scores and syllable boundary annotations from https://goo.gl/y0P7BL
Change dataset_root_path variable in src/filePath.py to locate the above dataset
Python 2.7.9 and Essentia 2.1-beta3 were used in the paper; Install python dependencies from requirements.txt.
Set mth_ODF, layer2, fusion and filter_shape variables in src/parameters.py
Run python onsetFunctionCalc.py to produce the experiment results for above parameter setting
Run python eval_demo.py to produce the evaluation result

Steps to train CNN acoustic models

Do steps 1, 2, 3, 4 in Steps to reproduce the experiment results
Run python trainingSampleCollection.py to calculate mel-bands features
CNN models training code is located in localDLScripts folder. Use them according to the computing configurations (CPU, GPU).
Pre-trained models are located in cnnModels folders

Dependencies

numpy scipy matplotlib essentia scikit-learn cython keras theano hyperopt

License

Affero GNU General Public License version 3

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.idea		.idea
HSMM		HSMM
cnnModels		cnnModels
eval		eval
hpcScripts		hpcScripts
localDLScripts		localDLScripts
madmom		madmom
src		src
testFiles/textgrid		testFiles/textgrid
trainingData		trainingData
README.md		README.md
eval_demo.py		eval_demo.py
eval_georgi.py		eval_georgi.py
onsetFunctionCalc.py		onsetFunctionCalc.py
onsetPatchClassificationExp.py		onsetPatchClassificationExp.py
peakPicking.py		peakPicking.py
phonemeDurationStat.py		phonemeDurationStat.py
phraseBoundary.py		phraseBoundary.py
requirements.txt		requirements.txt
statistics.py		statistics.py
targetAudioProcessing.py		targetAudioProcessing.py
trainingSampleCollection.py		trainingSampleCollection.py
viterbiDecoding.pyx		viterbiDecoding.pyx

ronggong/jingjuSyllabicSegmentaion

Folders and files

Latest commit

History

Repository files navigation

Jingju Singing Syllable Segmentation

Steps to reproduce the experiment results

Steps to train CNN acoustic models

Dependencies

License

About

Topics

Resources

Stars

Watchers

Forks

Languages