Python crawl_page 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: wiki_scraper

메소드/함수: crawl_page

hotexamples.com에서의 예제들: 2

Python crawl_page - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 wiki_scraper.crawl_page에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: wiki_model.py 프로젝트: arider/wikimodel

    def classify_url(self, domain, page, depth=0):
        """
        Classify the documents after crawling them.

        args:
            domain - the domain part of the url
            page - the other part of the url
            depth - how deep to crawl

        returns:
            a list of predicted probabilities for each instance belonging to
            each class
        """
        # get the documents
        documents, _ = crawl_page(domain, page, depth=0)

        # parse the documents
        string_data = []
        for page, doc in documents.iteritems():
            words = parse_html_simple(doc)
            parsed = []
            for word in words:
                if (word in self.english_words
                        and word not in self.stop_words
                        and word in self.vocabulary):
                    parsed.append(word)
            string_data.append(' '.join(parsed))

        count_data = self.vectorizer.transform(string_data)

        # classify the documents
        probs = self.classifier.predict_proba(count_data)
        return probs

예제 #2

파일 보기

파일: parallel_webscrape.py 프로젝트: arider/wikimodel

def call_scraper(args):
    domain = args[0]
    link = args[1]

    return crawl_page(domain, link, href_match='/wiki/', depth=1)