Python WebsiteLoader 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: dirbot.items

클래스/타입: WebsiteLoader

hotexamples.com에서의 예제들: 4

Python WebsiteLoader - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 dirbot.items.WebsiteLoader에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

WebsiteLoader(2)

add_xpath(2)

load_item(2)

예제 #1

파일 보기

파일: dmoz.py 프로젝트: raymondnuaa/Scrapy_DMOZ

    def parse(self, response):

        #hxs = Selector(response)
        #sites = hxs.select('//ul[@class="directory-url"]/li')
        sites = response.xpath("//div[@class='site-item ']")

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', "div[@class='title-and-desc']/a/div/text()")
            il.add_xpath('url', "div[@class='title-and-desc']/a/@href")
            il.add_xpath('description', "div/div[@class='site-descr ']/text()")
            yield il.load_item()

예제 #2

파일 보기

파일: dmoz.py 프로젝트: roy1985715/scrapy_roy

    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//ul[@class="directory-url"]/li')

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', 'a/text()')
            il.add_xpath('url', 'a/@href')
            il.add_xpath('description', 'text()', re='-\s([^\n]*?)\\n')
            yield il.load_item()

예제 #3

파일 보기

파일: dmoz.py 프로젝트: yuseferi/dirbot-mysql

    def parse(self, response):
        """
        The lines below is a spider contract. For more info see:
        http://doc.scrapy.org/en/latest/topics/contracts.html

        @url http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/
        @scrapes name
        """
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//ul[@class="directory-url"]/li')

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', 'a/text()')
            il.add_xpath('url', 'a/@href')
            il.add_xpath('description', 'text()', re='-\s([^\n]*?)\\n')
            yield il.load_item()

예제 #4

파일 보기

파일: dmoz.py 프로젝트: CrazyOrr/dirbot-db

    def parse(self, response):
        """
        The lines below is a spider contract. For more info see:
        http://doc.scrapy.org/en/latest/topics/contracts.html

        @url http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/
        @scrapes name
        """
        sites = response.xpath('//ul[@class="directory-url"]/li')

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', 'a/text()')
            il.add_xpath('url', 'a/@href')
            il.add_xpath('description', 'text()', re='-\s([^\n]*?)\\n')
            yield il.load_item()