Python RegexLinkExtractor.RegexLinkExtractorの例

プログラミング言語: Python

名前空間/パッケージ名: scrapy.linkextractors.regex

クラス/型: RegexLinkExtractor

メソッド/関数: RegexLinkExtractor

hotexamples.comのコード掲載数: 4

Python RegexLinkExtractor.RegexLinkExtractor - 4件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのscrapy.linkextractors.regex.RegexLinkExtractor.RegexLinkExtractorの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

RegexLinkExtractor(4)

extract_links(3)

よく使われるメソッド

RegexLinkExtractor (4)

extract_links (3)

コード例 #1

ファイルを表示

ファイル: test_linkextractors_deprecated.py プロジェクト: wkt2000/scrapy-1

 def test_extraction(self):
     # Default arguments
     lx = RegexLinkExtractor()
     self.assertEqual(lx.extract_links(self.response),
                      [Link(url='http://example.com/sample2.html', text=u'sample 2'),
                       Link(url='http://example.com/sample3.html', text=u'sample 3 text'),
                       Link(url='http://www.google.com/something', text=u''),
                       Link(url='http://example.com/innertag.html', text=u'inner tag'),])

コード例 #2

ファイルを表示

ファイル: test_linkextractors_deprecated.py プロジェクト: wkt2000/scrapy-1

 def test_link_wrong_href(self):
     html = """
     <a href="http://example.org/item1.html">Item 1</a>
     <a href="http://[example.org/item2.html">Item 2</a>
     <a href="http://example.org/item3.html">Item 3</a>
     """
     response = HtmlResponse("http://example.org/index.html", body=html)
     lx = RegexLinkExtractor()
     self.assertEqual([link for link in lx.extract_links(response)], [
         Link(url='http://example.org/item1.html', text=u'Item 1', nofollow=False),
         Link(url='http://example.org/item3.html', text=u'Item 3', nofollow=False),
     ])

コード例 #3

ファイルを表示

ファイル: recorder.py プロジェクト: bomquote/transistor-frontera

class MySpider(CrawlSpider):
    name = 'recorder'
    start_urls = [
        'http://' + DOMAIN,
    ]
    allowed_domains = [DOMAIN]

    rules = [Rule(FallbackLinkExtractor([
        LinkExtractor(allow=ALLOWED_RE),
        RegexLinkExtractor(allow=ALLOWED_RE),
    ]), callback='parse_page', follow=True)]

    def parse_page(self, response):
        pass

コード例 #4

ファイルを表示

ファイル: test_linkextractors_deprecated.py プロジェクト: wkt2000/scrapy-1

 def test_html_base_href(self):
     html = """
     <html>
         <head>
             <base href="http://b.com/">
         </head>
         <body>
             <a href="test.html"></a>
         </body>
     </html>
     """
     response = HtmlResponse("http://a.com/", body=html)
     lx = RegexLinkExtractor()
     self.assertEqual([link for link in lx.extract_links(response)], [
         Link(url='http://b.com/test.html', text=u'', nofollow=False),
     ])