Python HtmlParserLinkExtractor Examples

Programming Language: Python

Namespace/Package Name: scrapy.linkextractors.htmlparser

Examples at hotexamples.com: 5

Python HtmlParserLinkExtractor - 5 examples found. These are the top rated real world Python examples of scrapy.linkextractors.htmlparser.HtmlParserLinkExtractor extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

HtmlParserLinkExtractor(2)

extract_links(2)

Example #1

Show file

File: test_linkextractors.py Project: cdingding/scrapy

 def test_extraction(self):
     # Default arguments
     lx = HtmlParserLinkExtractor()
     self.assertEqual(lx.extract_links(self.response),
                      [Link(url='http://example.com/sample2.html', text=u'sample 2'),
                       Link(url='http://example.com/sample3.html', text=u'sample 3 text'),
                       Link(url='http://example.com/sample3.html', text=u'sample 3 repetition'),
                       Link(url='http://www.google.com/something', text=u''),
                       Link(url='http://example.com/innertag.html', text=u'inner tag'),])

Example #2

Show file

File: test_linkextractors_deprecated.py Project: wkt2000/scrapy-1

 def test_extraction(self):
     # Default arguments
     lx = HtmlParserLinkExtractor()
     self.assertEqual(lx.extract_links(self.response),
                      [Link(url='http://example.com/sample2.html', text=u'sample 2'),
                       Link(url='http://example.com/sample3.html', text=u'sample 3 text'),
                       Link(url='http://example.com/sample3.html', text=u'sample 3 repetition'),
                       Link(url='http://www.google.com/something', text=u''),
                       Link(url='http://example.com/innertag.html', text=u'inner tag'),])

Example #3

Show file

File: test_linkextractors.py Project: cdingding/scrapy

 def test_link_wrong_href(self):
     html = """
     <a href="http://example.org/item1.html">Item 1</a>
     <a href="http://[example.org/item2.html">Item 2</a>
     <a href="http://example.org/item3.html">Item 3</a>
     """
     response = HtmlResponse("http://example.org/index.html", body=html)
     lx = HtmlParserLinkExtractor()
     self.assertEqual([link for link in lx.extract_links(response)], [
         Link(url='http://example.org/item1.html', text=u'Item 1', nofollow=False),
         Link(url='http://example.org/item3.html', text=u'Item 3', nofollow=False),
     ])

Example #4

Show file

File: test_linkextractors_deprecated.py Project: wkt2000/scrapy-1

 def test_link_wrong_href(self):
     html = """
     <a href="http://example.org/item1.html">Item 1</a>
     <a href="http://[example.org/item2.html">Item 2</a>
     <a href="http://example.org/item3.html">Item 3</a>
     """
     response = HtmlResponse("http://example.org/index.html", body=html)
     lx = HtmlParserLinkExtractor()
     self.assertEqual([link for link in lx.extract_links(response)], [
         Link(url='http://example.org/item1.html', text=u'Item 1', nofollow=False),
         Link(url='http://example.org/item3.html', text=u'Item 3', nofollow=False),
     ])

Example #5

Show file

File: test_linkextractors_deprecated.py Project: RexMao/scrapy

 def test_extraction(self):
     # Default arguments
     lx = HtmlParserLinkExtractor()
     self.assertEqual(
         lx.extract_links(self.response),
         [
             Link(url="http://example.com/sample2.html", text=u"sample 2"),
             Link(url="http://example.com/sample3.html", text=u"sample 3 text"),
             Link(url="http://example.com/sample3.html", text=u"sample 3 repetition"),
             Link(url="http://www.google.com/something", text=u""),
             Link(url="http://example.com/innertag.html", text=u"inner tag"),
         ],
     )