Receipt scanner

School project for scanning printed text (receipts).

Prerequisites

Install tesseract
```
sudo apt install tesseract-ocr
```
Adding support for other languages in tesseract - Download *.traineddata file from github.com/tesseract-ocr/tessdata_fast. Then place it in your tesseract directory in tessdata/. (eg.: /usr/share/tesseract-ocr/4.00/tessdata)
- Support for polish language: pol.traineddata

Reader - OCR and stuff

Reader is meant to take an input image (eg. photo taken with a smartphone) and output formated contents of the scanned receipt. To achieve this goal it goes through the following steps:

Image preprocessing

To help OCR module and increase its accuracy module we preprocess images.

Crop - "cut out" the receipt from original image
Rescaling - Tesseract works best when the image is at least 300 dpi.
Blurring - is used in order to reduce noise.
Thersholding

Cropping works best when receipt is visible in its entirety (all four corners of the paper sheet have to be visible). It's best practice for pictures to have a dark, uniform background. Otherwise there might be problems detecting your receit.

Optical Character Recognition

For OCR we are using tesseract and pytesseract.

Parsing string data according to a selected parsing strategy

The problem with parsing contents of receipts is that every store have different receipt layout. Because of that we had to create different parsing strategies for different layouts.

How do I run it?

Well you don't... at this point

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
client		client
services		services
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client

client

services

services

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Receipt scanner

Prerequisites

Reader - OCR and stuff

Image preprocessing

Optical Character Recognition

Parsing string data according to a selected parsing strategy

How do I run it?

About

Releases

Packages

Contributors 2

Languages

pniewiejski/receipt_scanner

Folders and files

Latest commit

History

Repository files navigation

Receipt scanner

Prerequisites

Reader - OCR and stuff

Image preprocessing

Optical Character Recognition

Parsing string data according to a selected parsing strategy

How do I run it?

About

Resources

Stars

Watchers

Forks

Languages