Precise CTC

CTC implementations producing precise path probability p(l|x). Including cython, numba/python and theano versions. The theano implementation includes a batch version and a non-batch version.

A longer explanation why I "reinvent the wheel":

CTC (Connectionist Temporal Classification) plays a key role in LSTM-RNN training, with its power we can be liberated from the cumbersome segmentation / alignment task. By the time of this publication, there're already plenty of theano implementations of CTC all over the web. However, during my offline handwriting recognition research work with RNN, I sadly found that with these open-sourced theano implementations, none of them was able to compute the right path probability p(l|x) [1], though claimed successful RNN training's been done. This is really a pain in the ass. I've to get off the chair and dig into the origin of CTC algorithm to find out what went wrong.

It took me days to read the papers, understand the algorithm and try to re-implement it on my own. Finally the culprit is caught. The problem rises from how the numerical normalization is done. The CTC algorithm calculates with probability values, which are (much) less than 1.0. This will incur underflow along the dynamic programming recursion. In [2] it's recommended by Alex Graves to do the calculation in log scale by

                                   ln(a + b) = lna + ln(1 + exp(lnb-lna))

Adversely, this log scale calculation can occasionally cause numerical overflow, and this's why the above mentioned CTC implementations failed to compute the right path probability. The solution is to use time step rescaling method as in [3] instead of log scale calculation. The forward / backward variable will be rescaled at each time step of the DP recursion to prevent numerical underflow. My experiments have verified the effectiveness of this method.

One somewhat confusing fact I have to mention is that in Section 7.3.1 of [2], Alex Graves stated "Note that rescaling the variables at every timestep is less robust, and can fail for very long sequences". Meanwhile contradictory results got from experiments I conducted.

I'd like to acknowledge the authors of [4 ~ 6], their work and discussions with them are really of great help for developing this CTC theano implementation.

Reference :
[1] Alex Graves, etc., Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML, 2006
[2] Alex Graves, Supervised sequence labelling with recurrent neural networks, 2014
[3] Lawrence R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, 1989
[4] Maas Andrew, etc., https://github.com/amaas/stanford-ctc/blob/master/ctc_fast/ctc-loss/ctc_fast.pyx
[5] Mohammad Pezeshki, https://github.com/mohammadpz/CTC-Connectionist-Temporal-Classification/blob/master/ctc_cost.py
[6] Shawn Tan, https://github.com/shawntan/rnn-experiment/blob/master/CTC.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Test		Test
LICENSE		LICENSE
README.md		README.md
ctc_bench.py		ctc_bench.py
ctc_cython.pyx		ctc_cython.pyx
ctc_numba.py		ctc_numba.py
ctc_theano.py		ctc_theano.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test

Test

LICENSE

LICENSE

README.md

README.md

ctc_bench.py

ctc_bench.py

ctc_cython.pyx

ctc_cython.pyx

ctc_numba.py

ctc_numba.py

ctc_theano.py

ctc_theano.py

Repository files navigation

Precise CTC

A longer explanation why I "reinvent the wheel":

About

Releases

Packages

Languages

License

DingKe/Precise-CTC

Folders and files

Latest commit

History

Repository files navigation

Precise CTC

A longer explanation why I "reinvent the wheel":

About

Resources

License

Stars

Watchers

Forks

Languages