Python WebsiteLoader示例

编程语言: Python

命名空间/包名称: dirbot.items

类/类型: WebsiteLoader

hotexamples.com的示例: 4

Python WebsiteLoader - 已找到4个示例。这些是从开源项目中提取的最受好评的dirbot.items.WebsiteLoader现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

WebsiteLoader(2)

add_xpath(2)

load_item(2)

示例#1

显示文件

文件： dmoz.py 项目： raymondnuaa/Scrapy_DMOZ

    def parse(self, response):

        #hxs = Selector(response)
        #sites = hxs.select('//ul[@class="directory-url"]/li')
        sites = response.xpath("//div[@class='site-item ']")

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', "div[@class='title-and-desc']/a/div/text()")
            il.add_xpath('url', "div[@class='title-and-desc']/a/@href")
            il.add_xpath('description', "div/div[@class='site-descr ']/text()")
            yield il.load_item()

示例#2

显示文件

文件： dmoz.py 项目： roy1985715/scrapy_roy

    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//ul[@class="directory-url"]/li')

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', 'a/text()')
            il.add_xpath('url', 'a/@href')
            il.add_xpath('description', 'text()', re='-\s([^\n]*?)\\n')
            yield il.load_item()

示例#3

显示文件

文件： dmoz.py 项目： yuseferi/dirbot-mysql

    def parse(self, response):
        """
        The lines below is a spider contract. For more info see:
        http://doc.scrapy.org/en/latest/topics/contracts.html

        @url http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/
        @scrapes name
        """
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//ul[@class="directory-url"]/li')

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', 'a/text()')
            il.add_xpath('url', 'a/@href')
            il.add_xpath('description', 'text()', re='-\s([^\n]*?)\\n')
            yield il.load_item()

示例#4

显示文件

文件： dmoz.py 项目： CrazyOrr/dirbot-db

    def parse(self, response):
        """
        The lines below is a spider contract. For more info see:
        http://doc.scrapy.org/en/latest/topics/contracts.html

        @url http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/
        @scrapes name
        """
        sites = response.xpath('//ul[@class="directory-url"]/li')

        for site in sites:
            il = WebsiteLoader(response=response, selector=site)
            il.add_xpath('name', 'a/text()')
            il.add_xpath('url', 'a/@href')
            il.add_xpath('description', 'text()', re='-\s([^\n]*?)\\n')
            yield il.load_item()