Skip to content

Implementation of CRNN in recognizing Vietnamese Handwriting.

Notifications You must be signed in to change notification settings

nhanphan0411/viet-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Handwritten Vietnamese OCR

Implementation of Recurrent Neural Network in recognizing Vietnamese handwritings. The dataset is provided by CinnamonAI, within their Hackathon, 2018.


❊ RESULT

The project successfully achieved

  • Character Error Rate: 0.04
  • Word Error Rate: 0.14
  • Sentence Error Rate: 0.82

The hackathon's winner score is 0.1x on the Word Error Rate. Other metric results were not disclosed.

Sample predictions: above - label, below - prediction


⌘ PRE-PROCESS DATA

Original

Preprocessed

  • Preprocess on the official dataset
python transform.py --path ../data/raw/0916_DataSamples_2 --type train --transform
python transform.py --path ../data/raw/1015_Private_Test --type test --transform

Two new folders train/ and test/ and two json files containing the labels will be created in data/. The folders train/ and test/ contain the preprocessed images.

  • To create a validation set of 15 sample images
python transform.py --path ../data/raw/0825_DataSamples_1 --type val --transform
  • Show a sample of 50 preprocessed images.
python transform.py --type [train or test or val] --sample

🕸 MODEL

CRNN + CTC Loss is used to solve this challenge. CNN blocks with skip connections (inspired by ResNet50) are used to extract the features from the input image. The extracted feature map will be then passed through the LSTM layers.


🧠 TRAIN

python train.py --train

I trained the model for 30 epochs with learning_rate of 1e-3, then after that decay it to 1e-5. Clearly. the training could have been stopped early at epoch 20.


🤘🏻 TEST

python train.py --test --path [path to the test images]

Example python3 train.py --test --path ../data/test.

Predictions will be saved as predictions_text.txt

About

Implementation of CRNN in recognizing Vietnamese Handwriting.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages