Text Recognizer

Implementing the text recognizer project from the course "Full Stack Deep Learning Course" (FSDL) in PyTorch in order to learn best practices when building a deep learning project. I have expanded on this project by adding additional feature and ideas given by Claudio Jolowicz in "Hypermodern Python".

Prerequisite

pyenv (or similar) and python 3.9.* installed.
nox for linting, formatting, and testing.
Poetry is a project manager for python.

Installation

Install poetry and pyenv.

pyenv local 3.9.*
make install

Generate Datasets

Download and generate datasets by running:

make download
make generate

Train

Use, modify, or create a new experiment found at training/conf/experiment/. To run an experiment we first need to enter the virtual env by running:

poetry shell

Then we can train a new model by running:

python main.py +experiment=conv_transformer_paragraphs

Network

Create a picture of the network and place it here

Graveyard

Ideas of mine that did not work unfortunately:

Efficientnet was apparently a terrible choice of an encoder
- A ConvNext module heavily copied from lucidrains x-unet was incredibly much better at encoding the images to a better representation.
Use VQVAE to create pre-train a good latent representation
- Tests with various compressions did not show any performance increase compared to training directly e2e, more like decrease to be honest
- This is very unfortunate as I really hoped that this idea would work :(
- I still really like this idea, and I might not have given up just yet...
- I have now given up... :( ConvNext ftw
Axial Transformer Encoder
- Added a lot of extra parameters with no gain in performance
- Cool idea, but on a single GPU
Word Pieces
- Might have worked better, but liked the idea of single character recognition more.

Name		Name	Last commit message	Last commit date
Latest commit History 727 Commits
data/raw		data/raw
notebooks		notebooks
tests		tests
text_recognizer		text_recognizer
training		training
wandb		wandb
.darglint		.darglint
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint		.yamllint
Makefile		Makefile
README.md		README.md
mypy.ini		mypy.ini
noxfile.py		noxfile.py
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

aktersnurra/text-recognizer

Folders and files

Latest commit

History

Repository files navigation

Text Recognizer

Prerequisite

Installation

Generate Datasets

Train

Network

Graveyard

Todo

About

Topics

Resources

Stars

Watchers

Forks

Languages