Python webgraphItem Examples

Programming Language: Python

Namespace/Package Name: scrapy_webgraph.items

Method/Function: webgraphItem

Examples at hotexamples.com: 2

Python webgraphItem - 2 examples found. These are the top rated real world Python examples of scrapy_webgraph.items.webgraphItem extracted from open source projects. You can rate examples to help us improve the quality of examples.

Example #1

Show file

File: webgraph_perso.py Project: brozi/webpage-graph

 def parse_item(self, response):
     hxs = HtmlXPathSelector(response)
     i = webgraphItem()
     i['node'] = response.url
     print "#######################"
     print response.url
     print "#######################"
    # i['http_status'] = response.status
     llinks=[]
     for anchor in hxs.select('//a[@href]'):
         href=anchor.select('@href').extract()[0]
         if not href.lower().startswith("javascript") and  href.startswith("http://perso.ens-lyon.fr/baptiste.roziere/"):
             llinks.append(urljoin_rfc(response.url,href))
     i['edge'] = llinks
     return i

Example #2

Show file

File: webgraph.py Project: brozi/webpage-graph

 def parse_item(self, response):
     hxs = HtmlXPathSelector(response)
     i = webgraphItem()
     i['node'] = response.url
     print "#######################"
     print response.url
     print "#######################"
    # i['http_status'] = response.status
     llinks=[]
     seen = {}
     for anchor in hxs.select('//a[@href]'):
         href=anchor.select('@href').extract()[0]
         if href.startswith("http://www.cdiscount.com") and not (href in seen):
             seen[href]=True
             llinks.append(urljoin_rfc(response.url,href))
     i['edge'] = llinks
     return i