Skip to content

bdheath/OCRPDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OCRPDF

OCRPDF is a Python wrapper that helps you quiclkly OCR multi-page PDF documents

Requirements

You must have already installed:

Dependencies

You must also have installed the following Python modules:

Basic Usage

To create a new instance of OCRPDF and OCR a file:

from OCRPDF import OCRPDF

ocrTool = OCRPDF()
result = ocrTool.OCRPDF('YourFileNameHere')

This returns an object of:

	t         : raw text
	t_clean   : cleaned text
	pages     : number of pages
	p         : list of page data objects
	            pagenum : page number
				t       : raw text from this page
				t_clean : cleaned text from this page

So to view the raw text from page 3 of your document:

print result.p[2].t

(It's p[2] because lists are 0-based.)

About

Python wrapper for quickly OCR'ing multipage PDF files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages