Python Scraper.get_scraper 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: _scrapers

클래스/타입: Scraper

메소드/함수: get_scraper

hotexamples.com에서의 예제들: 5

Python Scraper.get_scraper - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 _scrapers.Scraper.get_scraper에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Scraper(3)

get_scraper(3)

get_item_info(2)

download(1)

load(1)

예제 #1

파일 보기

파일: test_scrapers.py 프로젝트: bosswissam/scraper

    def setUpClass(cls):
        '''Sets up saved pages for all urls in URLS, and loads them to be easily accessible 
        for tests.
        The most important thing happening here is population of cls.rows. Here's what's 
        happening:
        - at the start, cls.rows contains only the item and image urls from the csv file
        - since my code will load a file from disk from each test case, I pre-load it into 
        cls.rows. The code appends after the second element a list of objects the test will 
        use (e.g. etsy listing object and etsy seller object)
        - finally, I append to that row the appropriate scraper object, this way I don't have 
        to keep calling constructors in my code, instead I can simply retrieve the last 
        element.

        Note: the only requirement here is that the writer of the test knows which row to use
        for each test
        '''
        cls.cur_dir = os.getcwd()
        reader = sopen(TEST_URLS)
        scraper = Scraper()
        chdir(DOWNLOAD_DIR)        

        cls.rows = []
        for row in reader:
            row = row.split(",", 1)
            domain = get_domain(row[0])
            scraper = scraper.get_scraper(domain)
            if (REWRITE or exists(DOWNLOAD_DIR) is False):
                scraper.download(row[0])
            row.extend(scraper.load(row[0]))
            row.append(scraper)
            cls.rows.append(row)

        cls._test_is_set = True

예제 #2

파일 보기

파일: test_scrapers.py 프로젝트: skyelong/scraper

    def setUpClass(cls):
        '''Sets up saved pages for all urls in URLS, and loads them to be easily accessible 
        for tests.
        The most important thing happening here is population of cls.rows. Here's what's 
        happening:
        - at the start, cls.rows contains only the item and image urls from the csv file
        - since my code will load a file from disk from each test case, I pre-load it into 
        cls.rows. The code appends after the second element a list of objects the test will 
        use (e.g. etsy listing object and etsy seller object)
        - finally, I append to that row the appropriate scraper object, this way I don't have 
        to keep calling constructors in my code, instead I can simply retrieve the last 
        element.

        Note: the only requirement here is that the writer of the test knows which row to use
        for each test
        '''
        cls.cur_dir = os.getcwd()
        reader = sopen(TEST_URLS)
        scraper = Scraper()
        chdir(DOWNLOAD_DIR)

        cls.rows = []
        for row in reader:
            row = row.split(",", 1)
            domain = get_domain(row[0])
            scraper = scraper.get_scraper(domain)
            if (REWRITE or exists(DOWNLOAD_DIR) is False):
                scraper.download(row[0])
            row.extend(scraper.load(row[0]))
            row.append(scraper)
            cls.rows.append(row)

        cls._test_is_set = True

예제 #3

파일 보기

파일: test_scrapers.py 프로젝트: bosswissam/scraper

 def test_amazon_scraper(self):
     '''Test get_item_info for AmazonScraper
     '''
     scraper = Scraper()
     scraper = scraper.get_scraper('www.amazon.com')
     item = scraper.get_item_info('http://www.amazon.com/gp/product/B002P8T0L0/ref=s9_simh_gw_p23_d0_g23_i1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-2&pf_rd_r=0WQ1VFHRSY7ZTB93FGYG&pf_rd_t=101&pf_rd_p=470938631&pf_rd_i=507846','http://ecx.images-amazon.com/images/I/31hak2cSIOL.jpg')
     self.assertEqual(item.price, 75.99)
     self.assertEqual(item.currency_code, '$')
     self.assertEqual(item.user_interaction.likes, 42)
     self.assertEqual(item.quantity.new, 5)
     self.assertEqual(item.details.discount.value, 43.96)

예제 #4

파일 보기

파일: test_scrapers.py 프로젝트: skyelong/scraper

 def test_amazon_scraper(self):
     '''Test get_item_info for AmazonScraper
     '''
     scraper = Scraper()
     scraper = scraper.get_scraper('www.amazon.com')
     item = scraper.get_item_info(
         'http://www.amazon.com/gp/product/B002P8T0L0/ref=s9_simh_gw_p23_d0_g23_i1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-2&pf_rd_r=0WQ1VFHRSY7ZTB93FGYG&pf_rd_t=101&pf_rd_p=470938631&pf_rd_i=507846',
         'http://ecx.images-amazon.com/images/I/31hak2cSIOL.jpg')
     self.assertEqual(item.price, 75.99)
     self.assertEqual(item.currency_code, '$')
     self.assertEqual(item.user_interaction.likes, 42)
     self.assertEqual(item.quantity.new, 5)
     self.assertEqual(item.details.discount.value, 43.96)

예제 #5

파일 보기

def _pinscraperow(row, row_num):
    scraper = Scraper()
    url = row[0].strip()
    img_url = row[1].strip()
    dir_name = urllib.parse.quote_plus(url)
    mkdir(dir_name)
    download_image(img_url, dir_name)
    domain = get_domain(url)
    scraper = scraper.get_scraper(domain)
    if (scraper):
        print("Getting information from {0}... ".format(domain))
        content = scraper.get_item_info(url, img_url)
        if (content):
            json_dump_to_file('{0}/info.json'.format(dir_name), content)
        else:
            write_to_file('{0}/not_found.txt'.format(dir_name), 'w',
                          'The url at {0} was not found'.format(url))
        return True
    else:
        return domain