Python Document.pages Examples

Programming Language: Python

Namespace/Package Name: fitz

Class/Type: Document

Method/Function: pages

Examples at hotexamples.com: 2

Python Document.pages - 2 examples found. These are the top rated real world Python examples of fitz.Document.pages extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Document(8)

loadPage(5)

close(2)

pages(2)

xrefObject(2)

_getXrefLength(1)

_getXrefString(1)

get_toc(1)

set_toc(1)

Example #1

Show file

File: pdfxmeta.py Project: Krasjet/pdf.tocgen

def extract_meta(doc: Document,
                 pattern: str,
                 page: Optional[int] = None,
                 ign_case: bool = False) -> List[dict]:
    """Extract meta for a `pattern` on `page` in a pdf document

    Arguments
      doc: document from pymupdf
      pattern: a regular expression pattern
      page: page number (1-based index), if None is given, search for the
            entire document, but this is highly discouraged.
      ign_case: ignore case?
    """
    result = []

    if page is None:
        pages = doc.pages()
    elif 1 <= page <= doc.pageCount:
        pages = [doc[page - 1]]
    else:  # page out of range
        return result

    regex = re.compile(pattern,
                       re.IGNORECASE) if ign_case else re.compile(pattern)

    # we could parallelize this, but I don't see a reason
    # to *not* specify a page number
    for p in pages:
        result.extend(search_in_page(regex, p))

    return result

Example #2

Show file

File: recipe.py Project: Krasjet/pdf.tocgen

def extract_toc(doc: Document, recipe: Recipe) -> List[ToCEntry]:
    """Extract toc entries from a document

    Arguments
      doc: a pdf document
      recipe: recipe from user
    Returns
      a list of toc entries in the document
    """
    result = []

    for page in doc.pages():
        for blk in page.getTextPage().extractDICT().get('blocks', []):
            result.extend(
                recipe.extract_block(blk, page.number + 1)
            )

    return result