Python Download 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: common

클래스/타입: Download

hotexamples.com에서의 예제들: 9

Python Download - 9개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 common.Download에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Download(8)

download(2)

decode(1)

자주 사용되는 메소드들

Download (8)

download (2)

decode (1)

예제 #1

파일 보기

파일: 1.4.2sitemap_craw.py 프로젝트: houjq18/CsLearning

def crawl_sitemap(url):
    # download the sitemap file as sitemap.xml
    sitemap = Download(url)
    links = re.findall('<loc>(.*?)</loc>', sitemap.decode("utf-8"))
    # download each link
    for link in links:
        html = Download(link)

예제 #2

파일 보기

class SpiderMain(object):
    
    def __init__(self):
        self.manager = URLManager()
        self.down = Download()
        self.parser = HtmlParser()
        self.output = DataOutput()
    
    def crawl(self, root_url):
        content = self.down.download(root_url)
        movie_ids = self.parser.parse_urls(content)
        count = 0
        
        for mid in movie_ids:
            if count > 10:
                break
            movie_link = '''http://service.library.mtime.com/Movie.api?\
            Ajax_CallBack=true\
            &Ajax_CallBackType=Mtime.Library.Services\
            &Ajax_CallBackMethod=GetMovieOverviewRating\
            &Ajax_CrossDomain=1\
            &Ajax_RequestUrl=http%3A%2F%2Fmovie.mtime.com%2F{0}%2F\
            &t={1}\
            &Ajax_CallBackArgument0={2}\
            '''.format(mid, datetime.datetime.now().strftime("%Y%m%d%H%M%S%f"), mid)
            
            res = self.down.download(movie_link.replace(' ', ''))
            self.parser.parser_json(res)
            count += 1
        
        self.output.store_data(self.parser.items)
        self.output.close_connect()

예제 #3

파일 보기

파일: 1.4.2sitemap_crawler.py 프로젝트: abhijeetsingh1704/WebScrapping_learning

def crawl_sitemap(url):
    # download the sitemap file
    sitemap = Download(url)
    #>Downloading: http://example.webscraping.com/sitemap.xml
    # extract the sitemap links
    links = re.findall('<loc>(.*?)</loc>', sitemap)
    # download each link
    for link in links:
        html = Download(link)

예제 #4

파일 보기

파일: 1.4.4link_crawler1.py 프로젝트: abhijeetsingh1704/WebScrapping_learning

def link_crawler(seed_url, link_regex):
    """Crawl from the given seed URL following links matched by link_regex
    """
    crawl_queue = [seed_url]  # the queue of URL's to download
    while crawl_queue:
        url = crawl_queue.pop()
        html = Download(url)
        # filter for links matching our regular expression
        for link in get_links(html):
            if re.match(link_regex, link):
                # add this link to the crawl queue
                crawl_queue.append(link)

예제 #5

파일 보기

파일: 1.4.3iteration_crawler1.py 프로젝트: abhijeetsingh1704/WebScrapping_learning

def iteration():
    for page in itertools.count(1):
        url = 'http://example.webscraping.com/view/-%d' % page
        #url = 'http://example.webscraping.com/view/-{}'.format(page)
        html = Download(url)
        if html is None:
            # received an error trying to download this webpage
            # so assume have reached the last country ID and can stop downloading
            break
        else:
            # success - can scrape the result
            # ...
            pass

예제 #6

파일 보기

파일: 1.4.4link_crawler2.py 프로젝트: abhijeetsingh1704/WebScrapping_learning

def link_crawler(seed_url, link_regex):
    """Crawl from the given seed URL following links matched by link_regex
    """
    crawl_queue = [seed_url]
    seen = set(crawl_queue) # keep track which URL's have seen before
    while crawl_queue:
        url = crawl_queue.pop()
        html = Download(url)
        for link in get_links(html):
            # check if link matches expected regex
            if re.match(link_regex, link):
                # form absolute link
                link = urlparse.urljoin(seed_url, link)
                crawl_queue.append(link)

예제 #7

파일 보기

def iteration():
    """连续N次下载出错后推出程序"""
    max_errors = 5  # maximum number of consecutive download errors allowed
    num_errors = 0  # current number of consecutive download errors
    for page in itertools.count(1):
        url = 'http://example.webscraping.com/places/default/view/-{}'.format(
            page)
        html = Download(url)
        if html is None:
            # received an error trying to download this webpage
            num_errors += 1
            if num_errors == max_errors:
                # reached maximum amount of errors in a row so exit
                break
            # so assume have reached the last country ID and can stop downloading
        else:
            # success - can scrape the result
            # ...
            num_errors = 0

예제 #8

파일 보기

파일: test.py 프로젝트: chnhgn/crawl

from common import Download
from parsers import HtmlParser

dl = Download()
parse = HtmlParser()

content = dl.download('http://theater.mtime.com/China_Beijing/')
res = parse._parse_movies(content)
print(res)

예제 #9

파일 보기

 def __init__(self):
     self.manager = URLManager()
     self.down = Download()
     self.parser = HtmlParser()
     self.output = DataOutput()