Skip to content

coreyryanhanson/japanese_text_classifiers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recognizing Japanese Text

An open sourced Japanese digital writing classification model that makes predictions based on generated observations from drawing interaction from an android app.

Hidden layers

This project started out as an assignment with a deadline of only a few weeks that ended up being a race against time to squeeze in an MVP in the form of a CNN with an existing datastet while building my first Android app to facilitate creating a new dataset from scratch that would be used for my actual idea. If you are here to see the initial version and notebooks, those parts have been archived here.

But the more interesting results can be found in the notebooks directory. The revision has improved accuracy, more intriguing models, and much cleaner code. All the basic ideas have been rewritten for use with PyTorch instead of Keras and are more sophisticated in their practices.

The Original (probably but not necessarily abandoned) Roadmap:

Stage 1 - Build an OCR recognition model using existing data from the Kuzushiji-49. The observations have a degree of separation from the goal of this project, but it also provides an advantages in terms of comparison/future generalizations in that it's classification is a more difficult task since the historical kuzushiji script is less standardized.
Completed-6/16/2020

Stage 2 - Use transfer learning to bring the smaller dataset up to speed with the models of the Kuzushiji-49.
Completed-6/16/2020

Stage 3 -Once there are significant observations build a standalone without the kuzushiji data and determine the best architecture for the OCR model.
Completed-6/16/2020

Stage 4 - Explore the notion of using the raw data to provide additional data captured (the bitmap images inherently do not capture stroke direction or order).
Completed-6/16/2020

Stage 5 Rewrite a versatile study app that can allow generation of observations more efficiently.
Not Started - TBA

Stage 6 Expand to the katakana and Kanji datasets.
Not Started - TBA

About

Using neural networks to classify Japanese characters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published