pdftables

pdftables uses pdfminer to get information on the locations of text elements in a PDF document.

First we get a file handle to a PDF:

filepath = os.path.join(PDF_TEST_FILES,SelectedPDF)
fh = open(filepath,'rb')

Then we use our getPDFPage function to selection a single page from the document:

pdfPage = getPDFPage(fh, pagenumber)    
table,diagnosticData = pageToTables(pdfPage, extend_y = False, hints = hints, atomise = False)

Setting the optional extend_y parameter to True extends the grid used to extract the table to the full height of the page. The optional hints parameter is a two element string array, the first element should contain unique text at the top of the table, the second element should contain unique text from the bottom row of the table. Setting the optional atomise parameter to True converts all the text to individual characters this will be slower but will sometimes split closely separated columns.

table is a list of lists of strings. diagnosticData is an object containing diagnostic information which can be displayed using the plotpage function:

fig,ax1 = plotpage(diagnosticData)

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
pdftables		pdftables
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENCE		LICENCE
README.md		README.md
download_test_data.sh		download_test_data.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdftables

test

test

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENCE

LICENCE

README.md

README.md

download_test_data.sh

download_test_data.sh

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

pdftables - a library for extracting tables from PDF files

About

Releases

Packages

License

pombredanne/pdftables

Folders and files

Latest commit

History

Repository files navigation

pdftables - a library for extracting tables from PDF files

About

Resources

License

Stars

Watchers

Forks