Python url_normalize 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: urlnorm

메소드/함수: url_normalize

hotexamples.com에서의 예제들: 2

Python url_normalize - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 urlnorm.url_normalize에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: utils.py 프로젝트: rohit-nsit08/Balerion

    def add(self, element, priority):

        """
            Appends element to heap.
        """
        element = url_normalize(element) # only use normalized urls
        
        if element not in self.hashtable:
            heapq.heappush(self.heap, (priority, element))
            self.hashtable.add(element)

예제 #2

파일 보기

파일: crawler.py 프로젝트: rohit-nsit08/Balerion

        """
        self.pre_process()
        LOGGER.info("starting at (%s)... "% self.root)
        count = 0
        while self.unparsed_urls.heap and count < self.max_limit:
            # getting link to get
            url = self.unparsed_urls.get()
            count += 1
            # fetching page
            page = self.fetch_url(url)
            if page.status not in [404, 403, 500] and 'text/html' in page.headers['content-type']:
                LOGGER.info("visited: %s " % (url))
                self.process_page(page)
                self.process_page_links(page.body, page.url)   
        return count
if __name__ == '__main__':
    
    try:
        INPUT_URL = sys.argv[1]
        ALLOW_EXTERNAL  = int(sys.argv[2])
        ALLOW_REDIRECTS = int(sys.argv[3])
        MAX_LIMIT = int(sys.argv[4])
    except IndexError:
        LOGGER.info("Error: Incorrect start url / external options were passed\n note: all three parameters required")
        exit()
    
    BELA = Balerion(url_normalize(INPUT_URL), ALLOW_EXTERNAL, ALLOW_REDIRECTS, MAX_LIMIT)
    BELA.crawl()