Speech Embeddings

Using embedding-based loss functions for phonetics/speech recognition.

ABX-distance based embeddings:

emb_from_ab_dist.py

TODO write doc

"phn2vec" embeddings:

Phonetic annotations

There is no silver bullet, you need phonetically annotated speech corpora (e.g. TIMIT or the Buckeye corpus).

Phonemic annotations

Then you can also work on the phonemic annotations, for that you need to transform words into phonemes. I did a hack-job using the CMU phonemic dict.:

python timit_words_to_phonemes.py

You need to have the TIMIT corpus with a train.scp leading to *.xyz files having corresponding *.wrd files with word-level annotation (look at the constant at the start of timit_words_to_phonemes.py).

How to train the embedding? (Using word2vec from gensim)

python mlf_to_text.py < ~/postdoc/datasets/TIMIT_train_dev_test/train/train.mlf >> timit_train_from_phones.txt

or

python mlf_to_text.py --forcealigned --timitfoldings < ~/postdoc/datasets/TIMIT_train_dev_test/aligned_train.mlf >> timit_train_from_phones.txt
python train_word2vec.py timit_train_from_phones.txt

Same for the Buckeye corpus.

Comparing two embeddings is as simple as:

python train_word2vec.py timit_train_from_phones.txt timit_train_from_words.txt

or

python train_word2vec.py timit_train_from_phones.txt buckeye_train_from_phones.txt

Notes on the phone(me)s annotations:

For the Buckeye corpus, "tq" (glotal stop in "cat") folded to "sil".

For the TIMIT corpus, "dx" (flap in "butter") inexistent in "words" (phonemic annotation) version.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
AB_dist		AB_dist
data		data
LICENSE		LICENSE
README.md		README.md
cmudict.txt		cmudict.txt
emb_from_ab_dist.py		emb_from_ab_dist.py
gmm.py		gmm.py
higgins_reduced.txt		higgins_reduced.txt
higgins_rp_conf.txt		higgins_rp_conf.txt
iddata_full.txt		iddata_full.txt
iddata_means.txt		iddata_means.txt
ifa_reformat_config.py		ifa_reformat_config.py
mlf_to_text.py		mlf_to_text.py
reformat_ifa.py		reformat_ifa.py
timit_foldings.json		timit_foldings.json
timit_train_from_phones.txt		timit_train_from_phones.txt
timit_train_from_words.txt		timit_train_from_words.txt
timit_words_to_phonemes.py		timit_words_to_phonemes.py
train_word2vec.py		train_word2vec.py
vq.py		vq.py

License

labccin/speech_embeddings

Folders and files

Latest commit

History

Repository files navigation

Speech Embeddings

ABX-distance based embeddings:

"phn2vec" embeddings:

Phonetic annotations

Phonemic annotations

How to train the embedding? (Using word2vec from gensim)

Notes on the phone(me)s annotations:

About

Resources

License

Stars

Watchers

Forks

Languages