Python Crawler._normalize_url Exemples

Langage de programmation: Python

Espace de nommage/Pack: Crawler

Class/Type: Crawler

Méthode/Fonction: _normalize_url

Exemples au hotexamples.com: 1

Python Crawler._normalize_url - 1 exemples trouvés. Ce sont les exemples réels les mieux notés de Crawler.Crawler._normalize_url extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

Crawler(30)

crawl(15)

click(5)

close(4)

crawl_native(4)

getPage(3)

_process_next_url(2)

crawl_and_createfile(2)

add_to_dirlist(2)

crawl_multithread(2)

_process_html_link(2)

_process_html_asset(2)

_process_html(2)

save_crawler_data(2)

save_lists(2)

_make_request(2)

__init__(2)

render_sitemap(2)

crawling_process(1)

create_file(1)

create_view(1)

getCurrentPage(1)

getLinkStructure(1)

crawling(1)

crawl_own_albums(1)

Crawl(1)

getNextPage(1)

getPage2(1)

getTreeIndex(1)

getVisited(1)

hasNext(1)

join(1)

loadConf(1)

printLinkStructure(1)

process_q(1)

startCrawl(1)

startCrawling(1)

go(1)

crawl_index(1)

crawl_one(1)

baidu_search(1)

SLEEP_TIME(1)

URL_LIMIT(1)

_normalize_url(1)

_parse_url(1)

add(1)

addNewWorks(1)

add_target_full_profile(1)

add_target_short_profile(1)

all(1)

Méthodes fréquemment utilisées

Crawler (30)

crawl (15)

click (5)

close (4)

crawl_native (4)

getPage (3)

_process_next_url (2)

crawl_and_createfile (2)

add_to_dirlist (2)

crawl_multithread (2)

Méthodes fréquemment utilisées

_process_html_link (2)

_process_html_asset (2)

_process_html (2)

save_crawler_data (2)

save_lists (2)

_make_request (2)

__init__ (2)

render_sitemap (2)

crawling_process (1)

create_file (1)

create_view (1)

getCurrentPage (1)

getLinkStructure (1)

crawling (1)

crawl_own_albums (1)

Crawl (1)

getNextPage (1)

getPage2 (1)

getTreeIndex (1)

getVisited (1)

Méthodes fréquemment utilisées

create_view (1)

getCurrentPage (1)

getLinkStructure (1)

crawling (1)

crawl_own_albums (1)

Crawl (1)

getNextPage (1)

getPage2 (1)

getTreeIndex (1)

getVisited (1)

hasNext (1)

join (1)

loadConf (1)

printLinkStructure (1)

process_q (1)

startCrawl (1)

startCrawling (1)

go (1)

crawl_index (1)

crawl_one (1)

baidu_search (1)

SLEEP_TIME (1)

URL_LIMIT (1)

_normalize_url (1)

_parse_url (1)

add (1)

addNewWorks (1)

add_target_full_profile (1)

add_target_short_profile (1)

all (1)

Méthodes fréquemment utilisées

hasNext (1)

join (1)

loadConf (1)

printLinkStructure (1)

process_q (1)

startCrawl (1)

startCrawling (1)

go (1)

crawl_index (1)

crawl_one (1)

baidu_search (1)

SLEEP_TIME (1)

URL_LIMIT (1)

_normalize_url (1)

_parse_url (1)

add (1)

addNewWorks (1)

add_target_full_profile (1)

add_target_short_profile (1)

all (1)

bad_urls (1)

boot (1)

crawl_messages (1)

cases_annotation (1)

check_compete (1)

closeExtraTabs (1)

company_info (1)

crawl_board (1)

crawl_friends (1)

crawl_friends_pinboard (1)

crawl_items (1)

crawl_linked_albums (1)

crawl_linked_photos (1)

update_address (1)

Exemple #1

0

Afficher le fichier

Fichier : test_crawler.py Projet : vhamid/crawler-example

def test__nromalize_url(self): test_list = { "http://www.a.com#abc": "http://www.a.com/", "http://www.a.com/a/b/c": "http://www.a.com/a/b/c", # if no scheme is provided, urlsplit treats the domain name as the path # so we don't expect a trailing "/" after www.a.com "www.a.com?abc=123#abc": "://www.a.com?abc=123" } for test in test_list: usplit = urlparse.urlsplit(test) c = Crawler("http://mydomain.com") self.assertEqual(c._normalize_url(usplit), test_list[test])