Python ReadSetting.readurl 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: readsetting

클래스/타입: ReadSetting

메소드/함수: readurl

hotexamples.com에서의 예제들: 2

Python ReadSetting.readurl - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 readsetting.ReadSetting.readurl에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

ReadSetting(8)

projectname(3)

readalloweddomain(3)

readurl(2)

depth(1)

itemnumber(1)

pagenumber(1)

read_args(1)

readargs(1)

readrule(1)

readurlmatch(1)

readxpath(1)

requesttime(1)

savingformat(1)

savinglocation(1)

savingname(1)

예제 #1

파일 보기

파일: xpath_spider.py 프로젝트: LunaBlack/the_spider

    def __init__(self):
        rs = ReadSetting() #读取各项参数
        self.start_urls = rs.readurl()
        self.linkmatrix = LinkMatrix(rs.projectname())
        self.linkmatrix.setroot(self.start_urls)

        self.allowed_domains = rs.readalloweddomain()
        self.xpath = rs.readxpath()
        self.rules = [Rule(LinkExtractor(), follow=True, callback="parse_start_url")]
        #设置爬取规则:follow所有url;Request通过spidermiddlewares过滤掉限定域外的url;生成的response传递给parse_start_url
        #所有Request均经过spidermiddlewares

        super(XpathSpider, self).__init__()

예제 #2

파일 보기

파일: match_spider.py 프로젝트: LunaBlack/the_spider

    def __init__(self):

        rs = ReadSetting()  #读取各项参数
        self.start_urls = rs.readurl()
        self.linkmatrix = LinkMatrix(rs.projectname())
        self.linkmatrix.setroot(self.start_urls)

        self.allowed_domains = rs.readalloweddomain()
        self.allow, self.deny = rs.readurlmatch()

        self.regex_allow = re.compile('({0})'.format('|'.join(
            [re.escape(e) for e in self.allow])))  #生成正则表达式
        self.regex_deny = re.compile('({0})'.format('|'.join(
            [re.escape(e) for e in self.deny])))

        self.rules = [
            Rule(LinkExtractor(), follow=True, callback="parse_match")
        ]
        #设置爬取规则:follow所有url;Request通过spidermiddlewares过滤掉限定域外的url;生成的response传递给parse_match
        #所有Request均经过spidermiddlewares

        super(MatchSpider, self).__init__()