Skip to content

YongWookHa/im2latex

Repository files navigation

Im2LaTeX

Read Formula Image and translate to LaTeX Grammar, similar to Show, Attend and Tell and Harvard's paper and dataset.

I've changed the model structure based from Show, Attend and Tell.

Overview

This repository is built on base-template of Pytorch Template which is bi-product of original Pytorch Project Template. Check the template repositories first before getting started.

The main difference from Show, Attend and Tell is that I replaced row-encoder to positional encoding. And I set less 'max sequence length' with 40. With these changes, I could get perplexity of 1.0717 with reliable performance.


im2latex Result = \partial _ { \mu } ( F ^ { \mu \nu } - e j ^ { \mu } x ^ { \nu } ) : 0 .


im2latex Result : e x p \left( - \frac { \partial } { \partial \alpha _ { j } } \theta ^ { i k } \frac { \partial } { \partial \alpha _ { k } } \right)

Usage

1. Data Preprocess

Thanks to untrix, we can get refined LaTeX dataset from https://untrix.github.io/i2l/.

He provides his data processing strategy, so you can follow his preprocessing steps. If you are in hurry, you can just download Full Dataset as well.

Then you will have Ascii-LaTeX Formula Text Datasets around 140K formulas. Though you can get full formula images from untrix's dataset, I recommend to render the image yourself with LaTeX text dataset.

You can use sympy library to render formula from LaTeX text. With data/custom_preprocess_v2.py, you can render two type of formula image with Euler font deciding variable.

2. Edit json configs file

If your data path is different, edit configs/draft.json.

"debug": false,
"train_img_path" : "YOUR PATH",
"valid_img_path" : "YOUR PATH",
"train_formula_path" : "YOUR PATH",
"valid_formula_path" : "YOUR PATH"

3. Train

Copy your configs/draft.json to configs/train.json.
For training, you need to change the mode to train.

# configs/train.json

"mode": "train",

In terminal, run main.py with your custom train.json.

python main.py configs/train.json  

4. Predict

Copy your configs/draft.json to configs/predict.json.

"exp_name": "im2latex-draft",
"mode": "train",
"test_img_path" : "YOUR PATH",
"checkpoint_filename" : "YOUR PATH"

In terminal, run main.py with your custom predict.json.

python main.py configs/predict.json

Enjoy the codes.