Python Scraper.scrapの例

プログラミング言語: Python

名前空間/パッケージ名: scraper.scraper

クラス/型: Scraper

メソッド/関数: scrap

hotexamples.comのコード掲載数: 2

Python Scraper.scrap - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのscraper.scraper.Scraper.scrapの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Scraper(30)

store_data(7)

scrap(2)

check_for_missing_inmates(1)

collect_data(1)

collect_shop_data(1)

collect_shop_pages(1)

collect_urls(1)

dive_plus(1)

fetch_data(1)

コード例 #1

ファイルを表示

ファイル: test_integration_scraper.py プロジェクト: gator-life/gator.life

    def test_scrap(self):
        nb_doc = 4  # to keep test short
        curr_doc = 0
        scraper = Scraper(disconnected=True)
        directory = os.path.dirname(os.path.abspath(__file__))
        with vcr.use_cassette(directory + '/vcr_cassettes/test_run_scraper.yaml', record_mode='none', ignore_localhost=True):
            for doc in scraper.scrap():
                self.assertIsInstance(doc.url, unicode)
                self.assertIsInstance(doc.title, unicode)
                self.assertIsInstance(doc.content, unicode)
                self.assertNotIn(u'.gif', doc.url)  # check extension filter
                self.assertNotIn(u'youtu', doc.url)  # check regex filter

                curr_doc += 1
                if curr_doc == nb_doc:
                    break
            else:
                self.fail('error: not enough docs extracted from cassette, should be '
                          + str(nb_doc) + ', was ' + str(curr_doc))

コード例 #2

ファイルを表示

ファイル: run_scraper.py プロジェクト: gator-life/gator.life

def run():
    jsonpickle.set_encoder_options('simplejson', indent=4, ensure_ascii=False)

    scraper = Scraper()

    folder = '/media/nico/SAMSUNG/devs/gator/scraping reddit 10-01-2016'

    log_file = folder + 'run_scraper-' + str(datetime.datetime.utcnow()) + '.log'

    logging.basicConfig(format=u'%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO, filename=log_file)

    while True:
        try:
            for scraper_document in scraper.scrap():
                filename = folder + '/' + str(datetime.datetime.utcnow()) + '.json'
                json = jsonpickle.encode(scraper_document)
                with codecs.open(filename=filename, mode='w', encoding='utf-8') as file_desc:
                    file_desc.write(json)

        except Exception as exception:  # pylint: disable=broad-except
            logging.error("The orchestrator crashed! Starting it over ...")
            logging.exception(exception)
            sleep(30)