Python PDFParser.set_document Exemples

Langage de programmation: Python

Espace de nommage/Pack: pdfparser

Class/Type: PDFParser

Méthode/Fonction: set_document

Exemples au hotexamples.com: 4

Python PDFParser.set_document - 4 exemples trouvés. Ce sont les exemples réels les mieux notés de pdfparser.PDFParser.set_document extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

PDFParser(6)

parse(3)

get_processed_stems(2)

set_document(2)

get_text(1)

Méthodes fréquemment utilisées

PDFParser (6)

parse (3)

get_processed_stems (2)

set_document (2)

get_text (1)

Associées

get_user_emails

writeManifests

get_fastq_files

log_warning

_

OAuth_Client

dumps

element_factory

args_array_to_dict

process_text

Related in langs

StructureType (PHP)

Replication (PHP)

DBCComment (C#)

_014_invRefItemImageDomain (C#)

ldns_rr_new (C++)

TCP_stats (C++)

Endwin (Go)

BuildMetadata (Go)

SettingsManager (Java)

UnitHolder (Java)

Exemple #1

0

Afficher le fichier

def process_pdf(rsrcmgr, device, fp, pagenos=None, maxpages=0, password='', check_extractable=True): # Create a PDF parser object associated with the file object. parser = PDFParser(fp) # Create a PDF document object that stores the document structure. doc = PDFDocument() # Connect the parser and document objects. parser.set_document(doc) doc.set_parser(parser) # Supply the document password for initialization. # (If no password is set, give an empty string.) doc.initialize(password) # Check if the document allows text extraction. If not, abort. if check_extractable and not doc.is_extractable: raise PDFTextExtractionNotAllowed( 'Text extraction is not allowed: %r' % fp) # Create a PDF interpreter object. interpreter = PDFPageInterpreter(rsrcmgr, device) # Process each page contained in the document. for (pageno, page) in enumerate(doc.get_pages()): if pagenos and (pageno not in pagenos): continue interpreter.process_page(page) if maxpages and maxpages <= pageno + 1: break return

Exemple #2

0

Afficher le fichier

Fichier : pdfinterp.py Projet : ktisha/ebook-service

def process_pdf(rsrc, device, fp, pagenos=None, maxpages=0, password=''): doc = PDFDocument() parser = PDFParser(fp) parser.set_document(doc) doc.set_parser(parser) doc.initialize(password) if not doc.is_extractable: raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp) interpreter = PDFPageInterpreter(rsrc, device) for (pageno,page) in enumerate(doc.get_pages()): if pagenos and (pageno not in pagenos): continue interpreter.process_page(page) if maxpages and maxpages <= pageno+1: break return

Exemple #3

0

Afficher le fichier

def process_pdf(rsrc, device, fp, pagenos=None, maxpages=0, password=''): doc = PDFDocument() parser = PDFParser(fp) parser.set_document(doc) doc.set_parser(parser) doc.initialize(password) if not doc.is_extractable: raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp) interpreter = PDFPageInterpreter(rsrc, device) for (pageno,page) in enumerate(doc.get_pages()): if pagenos and (pageno not in pagenos): continue interpreter.process_page(page) if maxpages and maxpages <= pageno+1: break return

Exemple #4

0

Afficher le fichier

Fichier : pdfinterp_altered.py Projet : srbbins/ETD_Processing_Scripts

def process_pdf(rsrcmgr, device, fp, pagenos=None, maxpages=0, password='', caching=True, check_extractable=True): # Create a PDF parser object associated with the file object. parser = PDFParser(fp) # Create a PDF document object that stores the document structure. doc = PDFDocument(caching=caching) # Connect the parser and document objects. parser.set_document(doc) doc.set_parser(parser) # Supply the document password for initialization. # (If no password is set, give an empty string.) doc.initialize(password) # Check if the document allows text extraction. If not, abort. if check_extractable and not doc.is_extractable: raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp) # Create a PDF interpreter object. interpreter = PDFPageInterpreter(rsrcmgr, device) # Process each page contained in the document. for (pageno,page) in enumerate(doc.get_pages()): if pagenos and (pageno not in pagenos): continue interpreter.process_page(page) if maxpages and maxpages <= pageno+1: break return