Python AsyncCrawler._extract_disconnect_urls 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: wapitiCore.net.crawler

클래스/타입: AsyncCrawler

메소드/함수: _extract_disconnect_urls

hotexamples.com에서의 예제들: 2

Python AsyncCrawler._extract_disconnect_urls - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 wapitiCore.net.crawler.AsyncCrawler._extract_disconnect_urls에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

AsyncCrawler(30)

_auth_credentials(3)

_extract_disconnect_urls(2)

add_custom_header(1)

auth_method(1)

credentials(1)

drop_cookies(1)

is_in_scope(1)

scope(1)

set_proxy(1)

예제 #1

파일 보기

파일: test_crawler.py 프로젝트: devl00p/wapiti

async def test_extract_disconnect_urls():
    target_url = "http://perdu.com/"
    respx.get(target_url).mock(return_value=httpx.Response(
        200,
        text=
        "<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1> \
            <h2>Pas de panique, on va vous aider</h2> \
            <strong><pre>    * <----- vous &ecirc;tes ici</pre></strong><a href='http://perdu.com/foobar/'></a> \
            <a href='http://perdu.com/foobar/logout'></a> \
            <a href='http://perdu.com/foobar/logoff'></a> \
            <a href='http://perdu.com/foobar/signout'></a> \
            <a href='http://perdu.com/foobar/signoff'></a> \
            <a href='http://perdu.com/foobar/disconnect'></a> \
            <a href='../../foobar/déconnexion'></a> \
            </div></body></html>"))

    crawler = AsyncCrawler(Request(target_url), timeout=1)

    page = await crawler.async_get(Request(target_url))

    disconnect_urls = crawler._extract_disconnect_urls(page)

    test_disconnect_urls = [
        "http://perdu.com/foobar/logout", "http://perdu.com/foobar/logoff",
        "http://perdu.com/foobar/signout", "http://perdu.com/foobar/signoff",
        "http://perdu.com/foobar/disconnect",
        "http://perdu.com/foobar/déconnexion"
    ]

    assert len(disconnect_urls) == len(test_disconnect_urls)
    assert all(url in disconnect_urls for url in test_disconnect_urls) is True

예제 #2

파일 보기

파일: test_crawler.py 프로젝트: devl00p/wapiti

def test_extract_disconnect_urls_no_url():
    target_url = "http://perdu.com/"
    respx.get(target_url).mock(return_value=httpx.Response(
        200,
        text=
        "<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1> \
            <h2>Pas de panique, on va vous aider</h2> \
            <strong><pre>    * <----- vous &ecirc;tes ici</pre></strong><a href='http://perdu.com/foobar/'></a> \
            <a href='http://perdu.com/foobar/foobar'></a></body></html>"))

    resp = httpx.get(target_url, follow_redirects=False)
    page = Page(resp)

    crawler = AsyncCrawler(Request(target_url), timeout=1)

    disconnect_urls = crawler._extract_disconnect_urls(page)

    assert len(disconnect_urls) == 0