Python clean_html 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: dila2sql.html_utils

메소드/함수: clean_html

hotexamples.com에서의 예제들: 13

Python clean_html - 13개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 dila2sql.html_utils.clean_html에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_drops_spaces_around_line_breaks():
    # Basic
    unclean = '<p>\t Lorem ipsum\n </p>'
    cleaned = clean_html(unclean)
    expected = '<p>Lorem ipsum</p>'
    assert cleaned == expected
    # Complex
    unclean = '<p> <i> \nLorem <br/> ipsum\n </i> </p>'
    cleaned = clean_html(unclean)
    expected = '<p><i>Lorem<br/>ipsum</i></p>'
    assert cleaned == expected

예제 #2

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_does_not_collapse_spaces_inside_pre():
    unclean = '''
        <pre>    print("&gt; Hello world")
        </pre>
    '''
    actual = clean_html(unclean)
    expected = unclean.strip()
    assert actual == expected

예제 #3

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_drops_useless_attributes_and_elements():
    unclean = '''
        <h1 align="center">Titre <font>1</font></h1>
        <p id="foo"><span align="left"></span></p>
    '''
    cleaned = clean_html(unclean)
    expected = '<h1 align="center">Titre 1</h1>'
    assert cleaned == expected

예제 #4

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_drops_empty_elements_and_text_nodes():
    unclean = '''
        <p>Lorem ipsum</p>
        <p> <pre> </pre> </p>
    '''
    cleaned = clean_html(unclean)
    expected = '<p>Lorem ipsum</p>'
    assert cleaned == expected

예제 #5

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_preserves_attribute_order():
    expected = '<h1 a="0" b="1" c="2" d="3" e="4">Titre</h1>'
    actual = clean_html(expected)
    assert actual == expected

예제 #6

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_escapes_properly():
    original = '<p attr="&quot;">&lt;p&gt;</p>'
    actual = clean_html(original)
    expected = '''<p attr="&#34;">&lt;p&gt;</p>'''
    assert actual == expected

예제 #7

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_on_single_whitespace():
    r = clean_html(' ')
    assert r == ''

예제 #8

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_does_not_alter_clean_html():
    expected = '<h1 align="center">Titre</h1><p>Lorem ipsum &amp;</p>'
    actual = clean_html(expected)
    assert actual == expected

예제 #9

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_does_not_drop_empty_table_cells():
    unclean = '<tr><th></th><td> </td></tr><tr> </tr>'
    cleaned = clean_html(unclean)
    expected = '<tr><th/><td/></tr>'
    assert cleaned == expected

예제 #10

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_drops_line_breaks_at_the_beginning():
    unclean = ' <br/> <p> <br/> <br/> Text</p>'
    cleaned = clean_html(unclean)
    expected = '<p>Text</p>'
    assert cleaned == expected

예제 #11

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_drops_bad_spaces():
    unclean = "L' <span>article 2</span>\n."
    cleaned = clean_html(unclean)
    expected = "L'article 2."
    assert cleaned == expected

예제 #12

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_on_empty_string():
    r = clean_html('')
    assert r == ''

예제 #13

0

파일 보기

파일: test_html.py 프로젝트: SocialGouv/dila2sql

def test_clean_html_collapses_spaces():
    unclean = '<s> Lorem \r <b><i> ipsum</i> dolor\n\t</b>sit </s>'
    cleaned = clean_html(unclean)
    expected = '<s>Lorem <b><i>ipsum</i> dolor</b> sit</s>'
    assert cleaned == expected