Python ElasticsearchPipeline 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pipelines

hotexamples.com에서의 예제들: 2

Python ElasticsearchPipeline - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pipelines.ElasticsearchPipeline에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

ElasticsearchPipeline(1)

close_spider(1)

open_spider(1)

process_item(1)

예제 #1

파일 보기

파일: test_pipeline.py 프로젝트: hoardboard/scraper

 def setUp(self):
     self.es = Elasticsearch()
     self.pipeline = ElasticsearchPipeline()
     self.spider = CustomScraper(
         index="test_index",
         start_urls=["http://www.dmoz.org"],
         parser_string="//a",
         parser_dict={
             "text": "text()",
             "link": "@href",
         }
     )
     self.pipeline.open_spider(self.spider)

예제 #2

파일 보기

파일: test_pipeline.py 프로젝트: hoardboard/scraper

class TestElasticsearchPipeline(unittest.TestCase):

    def setUp(self):
        self.es = Elasticsearch()
        self.pipeline = ElasticsearchPipeline()
        self.spider = CustomScraper(
            index="test_index",
            start_urls=["http://www.dmoz.org"],
            parser_string="//a",
            parser_dict={
                "text": "text()",
                "link": "@href",
            }
        )
        self.pipeline.open_spider(self.spider)

    def tearDown(self):
        self.es.indices.delete(self.spider.index)

    def get_index_content(self):
        return self.es.search(self.spider.index, doc_type=self.spider.name)

    def get_index_length(self):
        return self.get_index_content()["hits"]["total"]

    # Wait until the number of documents in the index is stable, then allow for
    # assertions.
    def index_creation_wait(self):
        initial_length = self.get_index_length()
        time.sleep(1)
        while initial_length != self.get_index_length():
            time.sleep(1)
            initial_length = self.get_index_length()

    def test_process_item(self):
        item = {"link": ["http://www.continuum.io/"]}
        for x in range(1000):
            self.pipeline.process_item(item, self.spider)
        self.index_creation_wait()
        assert self.get_index_length() == 1000
        assert not self.pipeline.batch
        # Delete the previous index and start over with a new pipeline.
        self.pipeline.open_spider(self.spider)
        self.pipeline.process_item(item, self.spider)
        self.pipeline.close_spider(self.spider)
        self.index_creation_wait()
        assert self.get_index_length() == 1