Python url_to_lru Examples

Programming Language: Python

Namespace/Package Name: hcicrawler.lru

Method/Function: url_to_lru

Examples at hotexamples.com: 4

Python url_to_lru - 4 examples found. These are the top rated real world Python examples of hcicrawler.lru.url_to_lru extracted from open source projects. You can rate examples to help us improve the quality of examples.

Example #1

Show file

File: pages.py Project: heikkidoeleman/Hypertext-Corpus-Initiative

 def parse_html(self, response, lru):
     depth = response.meta['depth']
     lrulinks = []
     for link in self.link_extractor.extract_links(response):
         try:
             lrulink = url_to_lru(link.url)
         except ValueError, e:
             self.log("Error converting URL to LRU: %s" % e, log.ERROR)
             continue
         lrulinks.append(lrulink)
         if self._should_follow(depth, lru, lrulink) and \
                 not url_has_any_extension(link.url, self.ignored_exts):
             yield Request(link.url, callback=self.parse)

Example #2

Show file

File: pages.py Project: heikkidoeleman/Hypertext-Corpus-Initiative

 def parse_html(self, response, lru):
     depth = response.meta['depth']
     lrulinks = []
     for link in self.link_extractor.extract_links(response):
         try:
             lrulink = url_to_lru(link.url)
         except ValueError, e:
             self.log("Error converting URL to LRU: %s" % e, log.ERROR)
             continue
         lrulinks.append(lrulink)
         if self._should_follow(depth, lru, lrulink) and \
                 not url_has_any_extension(link.url, self.ignored_exts):
             yield Request(link.url, callback=self.parse)

Example #3

Show file

File: pages.py Project: heikkidoeleman/Hypertext-Corpus-Initiative

 def parse(self, response):
     lru = url_to_lru(response.url)
     if isinstance(response, HtmlResponse):
         return self.parse_html(response, lru)
     else:
         return self._make_raw_page(response, lru)

Example #4

Show file

File: pages.py Project: heikkidoeleman/Hypertext-Corpus-Initiative

 def parse(self, response):
     lru = url_to_lru(response.url)
     if isinstance(response, HtmlResponse):
         return self.parse_html(response, lru)
     else:
         return self._make_raw_page(response, lru)