Python load_existing_urls Examples

Programming Language: Python

Namespace/Package Name: eventfinda.dao_utils

Method/Function: load_existing_urls

Examples at hotexamples.com: 4

Python load_existing_urls - 4 examples found. These are the top rated real world Python examples of eventfinda.dao_utils.load_existing_urls extracted from open source projects. You can rate examples to help us improve the quality of examples.

Example #1

Show file

File: aisa_city_spider.py Project: luotigerlsx/DataAnalysis_ML

 def parse(self, response):
     '''
     Parse starting page. Extract event list, then yield further request for each event
     '''
     
     # load existing events for filtering
     existing_urls = load_existing_urls(self.name)
     
     # extract all events related content
     events_element = response.xpath('//div[@class="wrapper clearfix "]//div[contains(@class, "views-row")]')
     self.logger.info('Found %d events on url %s', len(events_element), response.url)
     
     for event in events_element:
         title = event.xpath('.//div[@class="views-field-title"]//a/text()').extract_first()
         url = urlparse.urljoin(response.url, event.xpath('.//div[@class="views-field-title"]//a/@href').extract_first())
         
         if url not in existing_urls:
             category = event.xpath('.//div[@class="views-field-field-event-category-value-1"]//a/text()').extract_first()
             
             # yield request for newly found event
             yield Request(url, meta={"category": category}, callback=self.parse_one_event)
         else:
             self.logger.info('Event %s with url %s has already been parsed', title, url)

Example #2

Show file

File: event_brite_spider.py Project: luotigerlsx/DataAnalysis_ML

 def __init__(self):
     # load existing urls for filtering
     self.existing_urls = load_existing_urls(self.name)

Example #3

Show file

File: whats_happen_spider.py Project: luotigerlsx/DataAnalysis_ML

 def __init__(self):
     # load existing urls for whatshappen
     self.existing_urls = load_existing_urls(self.name)

Example #4

Show file

File: event_finda_spider.py Project: luotigerlsx/DataAnalysis_ML

 def __init__(self):
     """Load existing urls for source eventfinda"""
     self.existing_urls = load_existing_urls(self.name)