Python Spider 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pattern.web

클래스/타입: Spider

hotexamples.com에서의 예제들: 9

Python Spider - 9개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pattern.web.Spider에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

__init__(4)

priority(1)

예제 #1

0

파일 보기

파일: 13-spider.py 프로젝트: julosaure/pattern

 def priority(self, link, method=DEPTH):
     if "?" in link.url:
         # This ignores links with a querystring.
         return 0.0
     else:
         # Otherwise use the default priority ranker,
         # i.e. the priority depends on DEPTH or BREADTH crawl mode.
         return Spider.priority(self, link, method)

예제 #2

0

파일 보기

 def priority(self, link, method=DEPTH):
     if "?" in link.url:
         # This ignores links with a querystring.
         return 0.0
     else:
         # Otherwise use the default priority ranker,
         # i.e. the priority depends on DEPTH or BREADTH crawl mode.
         return Spider.priority(self, link, method)

예제 #3

0

파일 보기

파일: spiders.py 프로젝트: Carlosmr/WhooshSearcher

 def priority(self, link, method=DEPTH):
     match = re.search("/\d{4}/\w{3}/\d{2}/", link.url)
     if match:
         if re.search("media", link.url):
             res = 0.0
         else:
             res =  Spider.priority(self, link, method)
     else:
         res= 0.0
     return res

예제 #4

0

파일 보기

파일: spiders.py 프로젝트: Carlosmr/WhooshSearcher

 def __init__(self, whoosh):
     Spider.__init__(self, links=["http://www.theguardian.com/"], domains=["www.theguardian.com"], delay=0.0)
     self.whoosh=whoosh

예제 #5

0

파일 보기

파일: spiders.py 프로젝트: Carlosmr/WhooshSearcher

 def priority(self, link, method=DEPTH):
     match = re.search("huffingtonpost.co.uk/\d{4}/\d{2}/\d{2}/", link.url)
     if match:
         return Spider.priority(self, link, method)
     else:
         return 0.0

예제 #6

0

파일 보기

파일: spiders.py 프로젝트: Carlosmr/WhooshSearcher

 def priority(self, link, method=DEPTH):
     match = re.search("in.reuters.com/article/\d{4}/\d{2}/\d{2}/", link.url)
     if match:
         return Spider.priority(self, link, method)
     else:
         return 0.0

예제 #7

0

파일 보기

파일: spiders.py 프로젝트: Carlosmr/WhooshSearcher

 def __init__(self, whoosh):
     Spider.__init__(self, links=["http://www.huffingtonpost.co.uk/"], domains=["huffingtonpost.co.uk"], delay=0.0)
     self.whoosh=whoosh

예제 #8

0

파일 보기

파일: spiders.py 프로젝트: Carlosmr/WhooshSearcher

 def __init__(self, whoosh):
     Spider.__init__(self, links=["http://in.reuters.com/"], domains=["in.reuters.com"], delay=0.0)
     self.whoosh=whoosh

예제 #9

0

파일 보기

파일: spider.py 프로젝트: Carlosmr/NaturalLanguageProccessing

 def __init__(self, links, domains, delay, whoosh):
     Spider.__init__(self, links=links, domains=domains, delay=delay)
     self.whoosh=whoosh