benchmarking

Bencharking scripts for various applications of Montreal Corpus Tools

In the MFA folder, there are several scripts beginning with benchmark_aligner, one per dataset. There are currently scripts to align the LibriSpeech corpus and the lab datasets for Quebec French, English, and Tagalog. If dict_path = None, the --nodict option is implemented (as in the Tagalog script). The paths to the relevant directories, as well as the number of jobs, can be changed at the top of the scripts. The models from alignment are stored in zip folders.

The reorganize_french_corpus.py script restructures the Quebec French dataset into a usable format for alignment.

The librispeech_to_chapters.py script organizes the LibriSpeech corpus into speaker folders that contain textgrids for each chapter.

The comparetextgrids.py script takes two paths to aligned corpora as command line arguments and outputs a csv file showing the average differences in word, phone, and segment-of-interest alignment, as well as the difference in counts of 'sil' segments. If a textgrid in one dataset does not have a corresponding one in the other dataset, nothing is outputted. If segments of interest are not indicated in a textgrid, there will be a blank space in the SOI column of the csv. In cases where the two alignments have different phone counts, the two counts will be listed and no average difference will be given.

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
MFA		MFA
SCT		SCT
utilities		utilities
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MFA

MFA

SCT

SCT

utilities

utilities

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

benchmarking

About

Releases

Packages

Languages

License

esteng/benchmarking

Folders and files

Latest commit

History

Repository files navigation

benchmarking

About

Resources

License

Stars

Watchers

Forks

Languages