Python get_plaintext_document_body示例

编程语言: Python

命名空间/包名称: refextract.references.engine

方法/功能: get_plaintext_document_body

hotexamples.com的示例: 2

Python get_plaintext_document_body - 已找到2个示例。这些是从开源项目中提取的最受好评的refextract.references.engine.get_plaintext_document_body现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

def test_get_plaintext_document_body(tmpdir):
    input = [u"Some text\n", u"on multiple lines\n"]
    f = tmpdir.join("plain.txt")
    f.write("".join(input))
    assert input == get_plaintext_document_body(str(f))

    with pytest.raises(UnknownDocumentTypeError) as excinfo:
        html = "<html><body>Some page</body></html>"
        f = tmpdir.join("page.html")
        f.write(html)
        get_plaintext_document_body(str(f))
    assert 'text/html' in excinfo.value.args

示例#2

显示文件

def test_clean_pdf_before_run(tmp_path, pdf_files):
    tmp_file_path = tmp_path / "packed.pdf"
    pdf = pdf_files[7]
    with open(pdf, 'rb') as input, open(tmp_file_path, 'wb') as tmp_out:
        tmp_out.write(input.read())

    text = get_plaintext_document_body(tmp_file_path.as_posix())
    assert text == ['Test\n', '\x0c']