Python ScrapyExtractor示例

编程语言: Python

命名空间/包名称: COPY_scrapy_extractor

类/类型: ScrapyExtractor

hotexamples.com的示例: 2

Python ScrapyExtractor - 已找到2个示例。这些是从开源项目中提取的最受好评的COPY_scrapy_extractor.ScrapyExtractor现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

extract(2)

示例#1

显示文件

文件： Copy_testModule.py 项目： Ravi-Kumar/TestingModules

 def testLinkExtract(self):
     """Testing for recursive link extraction from given page and following it"""
     
     #Initialize the extractor
     configFile = 'linkExtractionTest.yml'
     extractor = ScrapyExtractor(configFile, self.urlUtil)  
  
     #Create a link item
     item = LinkArrayItem()
     item.init()
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
     #Create an HtmlResponse object for performing XPath operations on it
     bodyForResponse = self.urlUtil.getUrlResponse(self.urlAddress)
     response = HtmlResponse(self.urlAddress, body=bodyForResponse)
     
     #Extract links from the HtmlResponse object
     extractedData = extractor.extract(response, item)
     
     #Load the correct data and verify it with extracted
     print extractedData

示例#2

显示文件

文件： Copy_testModule.py 项目： Ravi-Kumar/TestingModules

 def testImageExtract(self):
     """Testing for image extraction from given page"""
     
     #Initialize the extractor
     configFile = 'imageExtractionTest.yml'
     extractor = ScrapyExtractor(configFile, self.urlUtil)  
     
     #Create an image item
     item = ImageArrayItem() 
     item.init()
             
     #Create an HtmlResponse object for performing XPath operations on it
     bodyForResponse = self.urlUtil.getUrlResponse(self.urlAddress)
     response = HtmlResponse(self.urlAddress, body=bodyForResponse)
     
     #Extract images from the HtmlResponse object
     extractedData = extractor.extract(response, item)
     
     #Load the correct data and verify it with extracted 
     trueData = json.load(open('correctImageData','r'))
     self.assertTrue(extractedData == trueData)