Skip to content

Python script used to extract some data from a pdf file : Pages a JPG files, Pages as SVG files, Thumbnails of each page and Table of content.

Notifications You must be signed in to change notification settings

bumblebeefr/pdf-data-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

pdf data extractor

A Quick and dirty python script I used to extract some data from a pdf file :

  • All pages as JPG files
  • All pages as SVG files
  • A thumbnail of each page
  • The Table of content of the Document, as a json document.

Usage

python pdf-data-extractor.py myfile.pdf

Will generate a folder named myfile and extract data into it

Dependencies

It depends on some python libraries :

  • argparse
  • pdfminer
  • Python Imaging Library (PIL)

and on some Linux command line tools :

  • pdftoppm
  • pdf2svg
  • convert (image magick)

About

Python script used to extract some data from a pdf file : Pages a JPG files, Pages as SVG files, Thumbnails of each page and Table of content.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages