Skip to content

A text recognizer in PyTorch, based on the project and best practices taught in the "Full Stack Deep Learning Course", combined with Hypermodern Python by Claudio Jolowicz.

Notifications You must be signed in to change notification settings

aktersnurra/text-recognizer

Repository files navigation

Text Recognizer

Implementing the text recognizer project from the course "Full Stack Deep Learning Course" (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in "Hypermodern Python".

Prerequisite

  • pyenv (or similar) and python 3.9.* installed.

  • nox for linting, formatting, and testing.

  • Poetry is a project manager for python.

Installation

Install poetry and pyenv.

pyenv local 3.9.*
make install

Generate Datasets

Download and generate datasets by running:

make download
make generate

Train

Use, modify, or create a new experiment found at training/conf/experiment/. To run an experiment we first need to enter the virtual env by running:

poetry shell

Then we can train a new model by running:

python main.py +experiment=conv_transformer_paragraphs

Network

Create a picture of the network and place it here

Graveyard

Ideas of mine that did not work unfortunately:

  • Efficientnet was apparently a terrible choice of an encoder

    • A ConvNext module heavily copied from lucidrains x-unet was incredibly much better at encoding the images to a better representation.
  • Use VQVAE to create pre-train a good latent representation

    • Tests with various compressions did not show any performance increase compared to training directly e2e, more like decrease to be honest
    • This is very unfortunate as I really hoped that this idea would work :(
    • I still really like this idea, and I might not have given up just yet...
    • I have now given up... :( ConvNext ftw
  • Axial Transformer Encoder

    • Added a lot of extra parameters with no gain in performance
    • Cool idea, but on a single GPU
  • Word Pieces

    • Might have worked better, but liked the idea of single character recognition more.

Todo

  • remove einops (try)
  • Tests
  • Evaluation
  • Wandb artifact fetcher
  • fix linting
  • Modularize the decoder
  • Add kv cache
  • Train with Laprop
  • Fix stems
  • residual attn
  • single kv head
  • fix rotary embedding
  • simplify attention with norm
  • tie embeddings
  • cnn -> tf encoder -> tf decoder

About

A text recognizer in PyTorch, based on the project and best practices taught in the "Full Stack Deep Learning Course", combined with Hypermodern Python by Claudio Jolowicz.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published