Document Scanner with OCR

To detect documents in trivial images, process them and extract text from them. Used OpenCV v3.0 for document processing and Tesseract v3.03 for optical text recognition

Usage: python docscan.py -i data/Sample1.jpg

Roadmap:

Release standalone Linux desktop version and smartphone versions of this project.
Improve the accuracy of the project.
- Possible areas for imrpovement:
  - Automate some parts of Tesseract training module for easier training.
  - Use different levels of smoothing on Tesseract input, compare text outputs and merge them so as to get high accuracy.
  - Extensively learn Tesseract's methods to try and modify them for greater accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README		README
data		data
lib		lib
pytesser		pytesser
result		result
.gitattributes		.gitattributes
README.md		README.md
docscan.py		docscan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README

README

data

data

lib

lib

pytesser

pytesser

result

result

.gitattributes

.gitattributes

README.md

README.md

docscan.py

docscan.py

Repository files navigation

Document Scanner with OCR

About

Releases

Packages

Languages

arajago6/DocumentScanWithOCR

Folders and files

Latest commit

History

Repository files navigation

Document Scanner with OCR

About

Resources

Stars

Watchers

Forks

Languages