Python XpathUtil 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: scrapymasters.util.XPathUtil

클래스/타입: XpathUtil

hotexamples.com에서의 예제들: 6

Python XpathUtil - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 scrapymasters.util.XPathUtil.XpathUtil에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

xpath_for_class(3)

예제 #1

파일 보기

파일: BBCSpider.py 프로젝트: ricjhill/scrape-bbc

    def parse(self, response):
        articles = response.xpath("//" +
                                  XpathUtil.xpath_for_class('media__content'))
        for article in articles:
            item = GuardianItem()

            item['title'] = StringUtil.get_first(
                article.xpath(
                    XpathUtil.xpath_for_class('media__title') +
                    "/a/text()").extract(), "").strip(' \n')
            item['tags'] = StringUtil.get_first(
                article.xpath(
                    XpathUtil.xpath_for_class('media__tag') +
                    "/text()").extract(), "").strip(' \n')
            item['summary'] = StringUtil.get_first(
                article.xpath(
                    XpathUtil.xpath_for_class('media__summary') +
                    "/text()").extract(), "").strip(' \n')

            article_url = ''.join(
                article.xpath(
                    XpathUtil.xpath_for_class("media__title") +
                    "/a/@href").extract())

            url = response.urljoin(article_url)

            yield scrapy.Request(url,
                                 callback=self.parse_dir_contents,
                                 meta=item)

예제 #2

파일 보기

파일: BBCSpider.py 프로젝트: waxmittmann/scrape-bbc

    def parse_dir_contents(self, response):
        item = response.meta

        header = StringUtil.get_first(
            response.xpath("//" + XpathUtil.xpath_for_class("story-body__h1") + "/text()").extract(), ""
        ).strip(" \n")

        body_list = response.xpath("//" + XpathUtil.xpath_for_class("story-body__inner") + "//p/text()").extract()
        body = " ".join(body_list).strip(" \n")

        item["header"] = header
        item["url"] = response.url
        item["body"] = body
        yield item

예제 #3

파일 보기

파일: BBCSpider.py 프로젝트: ricjhill/scrape-bbc

    def parse_dir_contents(self, response):
        item = response.meta

        header = StringUtil.get_first(
            response.xpath("//" + XpathUtil.xpath_for_class("story-body__h1") +
                           "/text()").extract(), "").strip(' \n')

        body_list = response.xpath(
            "//" + XpathUtil.xpath_for_class("story-body__inner") +
            "//p/text()").extract()
        body = ' '.join(body_list).strip(' \n')

        item['header'] = header
        item['url'] = response.url
        item['body'] = body
        yield item

예제 #4

파일 보기

파일: BBCSpider.py 프로젝트: waxmittmann/scrape-bbc

    def parse(self, response):
        articles = response.xpath("//" + XpathUtil.xpath_for_class("media__content"))
        for article in articles:
            item = GuardianItem()

            item["title"] = StringUtil.get_first(
                article.xpath(XpathUtil.xpath_for_class("media__title") + "/a/text()").extract(), ""
            ).strip(" \n")
            item["tags"] = StringUtil.get_first(
                article.xpath(XpathUtil.xpath_for_class("media__tag") + "/text()").extract(), ""
            ).strip(" \n")
            item["summary"] = StringUtil.get_first(
                article.xpath(XpathUtil.xpath_for_class("media__summary") + "/text()").extract(), ""
            ).strip(" \n")

            article_url = "".join(article.xpath(XpathUtil.xpath_for_class("media__title") + "/a/@href").extract())

            url = response.urljoin(article_url)

            yield scrapy.Request(url, callback=self.parse_dir_contents, meta=item)

예제 #5

파일 보기

파일: XpathUtilTest.py 프로젝트: waxmittmann/scrape-bbc

    def test_xpath_for_class_should_return_correct_xpath_statement_to_match_class(self):
        result = XpathUtil.xpath_for_class("someclass")

        self.assertEqual(result, "*[contains(concat(' ', @class, ' '), ' someclass ')]")

예제 #6

파일 보기

파일: XpathUtilTest.py 프로젝트: ricjhill/scrape-bbc

    def test_xpath_for_class_should_return_correct_xpath_statement_to_match_class(
            self):
        result = XpathUtil.xpath_for_class("someclass")

        self.assertEqual(
            result, "*[contains(concat(' ', @class, ' '), ' someclass ')]")