Python HTMLDocument.get_title 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: cosrlib.document.html

클래스/타입: HTMLDocument

메소드/함수: get_title

hotexamples.com에서의 예제들: 4

Python HTMLDocument.get_title - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 cosrlib.document.html.HTMLDocument.get_title에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

HTMLDocument(23)

parse(6)

get_all_words(2)

get_hyperlinks(2)

get_title(2)

get_domain_paid_words(1)

get_external_hyperlinks(1)

get_internal_hyperlinks(1)

get_summary(1)

get_url(1)

get_url_words(1)

get_word_groups(1)

parse_canonical_url(1)

예제 #1

파일 보기

def test_parsing_samples(sample_name):
    metadata = SAMPLES[sample_name]

    sample_file = "tests/testdata/html_page_samples/%s" % sample_name
    with open(sample_file, "r") as f:
        html = f.read()

        page = HTMLDocument(html).parse()

        if "title" in metadata:
            assert metadata["title"] == page.get_title()

        if "summary" in metadata:
            assert metadata["summary"] == page.get_summary()

        # for k, g in sorted(page.get_word_groups().items()):
        #   print k, g

        words = page.get_all_words()
        lower_words_set = set([w.lower() for w in words])

        # Uncomment this to debug
        if metadata.get("debug"):
            print words

        for word in metadata.get("assert_words_missing", []):
            assert word not in lower_words_set

        for word in metadata.get("assert_words", []):
            assert word in lower_words_set

예제 #2

파일 보기

파일: test_samples.py 프로젝트: bakztfuture/cosr-back

def test_parsing_samples(sample_name):
    metadata = SAMPLES[sample_name]

    sample_file = "tests/testdata/html_page_samples/%s" % sample_name
    with open(sample_file, "r") as f:
        html = f.read()

        page = HTMLDocument(html).parse()

        if "title" in metadata:
            assert metadata["title"] == page.get_title()

        if "summary" in metadata:
            assert metadata["summary"] == page.get_summary()

        # for k, g in sorted(page.get_word_groups().items()):
        #   print k, g

        words = page.get_all_words()

        # Uncomment this to debug
        if metadata.get("debug"):
            print words

        for word in metadata.get("assert_words_missing", []):
            assert word not in words

        for word in metadata.get("assert_words", []):
            assert word in words

예제 #3

파일 보기

파일: test_encoding.py 프로젝트: JBaba/cosr-back

def test_reparse():
    from cosrlib.document.html import HTMLDocument

    doc = HTMLDocument("""<html><head><meta charset="iso-8859-15"><title>Mac\xe9o</title></head></html>""")
    assert doc.encoding.detect().name == "iso8859-15"

    # A re-parsing of the document should be triggered, gumbo only accepts utf-8
    doc.parse()

    assert doc.get_title() == "Mac\xc3\xa9o"

예제 #4

파일 보기

파일: test_encoding.py 프로젝트: x0rzkov/cosr-back

def test_reparse():
    from cosrlib.document.html import HTMLDocument

    doc = HTMLDocument(
        """<html><head><meta charset="iso-8859-15"><title>Mac\xe9o</title></head></html>"""
    )
    assert doc.encoding.detect().name == "iso8859-15"

    # A re-parsing of the document should be triggered, gumbo only accepts utf-8
    doc.parse()

    assert doc.get_title() == "Mac\xc3\xa9o"