Python getPage 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: proxyPool.utils

메소드/함수: getPage

hotexamples.com에서의 예제들: 5

Python getPage - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 proxyPool.utils.getPage에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

0

파일 보기

파일: crawler.py 프로젝트: paulRoux/IpProxyPool

    def crawData5u(self, pageCount = 1):
        startUrl = 'http://www.data5u.com/free/gngn/index.shtml'
        headers = {
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'Accept-Encoding': 'gzip, deflate',
            'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
            'Cache-Control': 'max-age=0',
            'Connection': 'keep-alive',
            'Cookie': 'JSESSIONID=694DB8BC18C0697975ABD4D10A216C38',
            'Host': 'www.data5u.com',
            'Referer': 'http://www.data5u.com/free/index.shtml',
            'Upgrade-Insecure-Requests': '1',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36',
        }

        for count in range(pageCount):
            print("开始爬取 {} 第 {} 页".format(startUrl, count+1))
            source = getPage(startUrl, option=headers)
            html = etree.HTML(source)

            items = html.xpath("//div[@class='wlist']//li//ul")
            for item in items[1:]:
                speed = item.xpath(".//span[8]/li/text()")
                if float(speed[0].replace('秒', "").strip()) > 4.0:
                    continue
                ip = item.xpath(".//span[1]/li/text()")
                port = item.xpath(".//span[2]/li/text()")
                yield ":".join([ip[0], port[0]])

예제 #2

0

파일 보기

파일: crawler.py 프로젝트: paulRoux/IpProxyPool

    def crawIp66DL(self, pageCount = 4):
        startUrl = "http://www.66ip.cn"  ## HTTP && HTTPS
        for count in range(pageCount):
            print("开始爬取 {} 第 {} 页".format(startUrl, count+1))
            source = getPage(startUrl)
            html = etree.HTML(source)

            items = html.xpath("//div[@id='main']//tbody//tr")
            for item in items:
                ip = item.xpath(".//td[1]/text()")
                port = item.xpath(".//td[2]/text()")
                yield ":".join([ip[0], port[0]])

            page = count + 1
            startUrl = "http://www.66ip.cn/" + page

예제 #3

0

파일 보기

파일: crawler.py 프로젝트: paulRoux/IpProxyPool

    def crawKuaiDL(self, pageCount = 4):
        startUrl = "https://www.kuaidaili.com/free/inha/"  ## HTTP
        for count in range(pageCount):
            print("开始爬取 {} 第 {} 页".format(startUrl, count+1))
            source = getPage(startUrl)
            html = etree.HTML(source)

            items = html.xpath("//div//div[@id='list']//tbody/tr")
            for item in items:
                speed = item.xpath(".//td[6]/text()")
                if float(speed[0].replace('秒', "").strip()) > 4.0:
                    continue
                ip = item.xpath(".//td[1]/text()")
                port = item.xpath(".//td[2]/text()")
                yield ":".join([ip[0], port[0]])

            page = count + 1
            startUrl = "https://www.kuaidaili.com/free/inha/" + page

예제 #4

0

파일 보기

파일: crawler.py 프로젝트: paulRoux/IpProxyPool

    def crawXici(self, pageCount = 4):
        startUrl = "https://www.xicidaili.com/wt/"  ## HTTP
        for count in range(pageCount):
            print("开始爬取 {} 第 {} 页".format(startUrl, count+1))
            source = getPage(startUrl)
            html = etree.HTML(source)

            items = html.xpath("//table[@id='ip_list']//tr")
            for item in items[1:]:
                speed = item.xpath(".//td[7]/div/@title")
                if float(speed[0].replace('秒', "").strip()) > 4.0:
                    continue
                ip = item.xpath(".//td[2]/text()")
                port = item.xpath(".//td[3]/text()")
                yield ":".join([ip[0], port[0]])

            nextLink = html.xpath("//div[@class='pagination']//a[@class='next_page']/@href")
            if nextLink:
                startUrl = "https://www.xicidaili.com" + nextLink[0]

예제 #5

0

파일 보기

파일: crawler.py 프로젝트: paulRoux/IpProxyPool

    def crawYunDL(self, pageCount = 4):
        if pageCount > 7:
            print("最大页数 7 页，已设置为 7 页！")
            pageCount = 7
        startUrl = "http://www.ip3366.net/free/?stype=1"  ## HTTP && HTTPS
        for count in range(pageCount):
            print("开始爬取 {} 第 {} 页".format(startUrl, count+1))
            source = getPage(startUrl)
            html = etree.HTML(source)

            items = html.xpath("//div[@id='list']//tbody//tr")
            for item in items:
                speed = item.xpath(".//td[6]/text()")
                if float(speed[0].replace('秒', "").strip()) > 4.0:
                    continue
                ip = item.xpath(".//td[1]/text()")
                port = item.xpath(".//td[2]/text()")
                yield ":".join([ip[0], port[0]])

            page = count + 1
            startUrl = "http://www.ip3366.net/free/?stype=1&page=" + page