Python get_content_type_encoding 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: spyder.encoding

메소드/함수: get_content_type_encoding

hotexamples.com에서의 예제들: 2

Python get_content_type_encoding - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 spyder.encoding.get_content_type_encoding에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: htmllinkextractor.py 프로젝트: truemped/Spyder

 def _restrict_content_type(self, curi):
     """
     Decide based on the `CrawlUri`s Content-Type whether we want to process
     it.
     """
     allowed = ["text/html", "application/xhtml", "text/vnd.wap.wml",
         "application/vnd.wap.wml", "application/vnd.wap.xhtm"]
     (ctype, _enc) = get_content_type_encoding(curi)
     return ctype in allowed

예제 #2

파일 보기

파일: htmllinkextractor.py 프로젝트: truemped/Spyder

    def __call__(self, curi):
        """
        Actually extract links from the html content if the content type
        matches.
        """
        if not self._restrict_content_type(curi):
            return curi

        if CURI_EXTRACTION_FINISHED in curi.optional_vars and \
            curi.optional_vars[CURI_EXTRACTION_FINISHED] == CURI_OPTIONAL_TRUE:
            return curi

        (_type, encoding) = get_content_type_encoding(curi)

        try:
            content = curi.content_body.decode(encoding)
        except Exception:
            content = curi.content_body

        parsed_url = urlparse.urlparse(curi.url)
        self._base_url = curi.url

        # iterate over all tags
        for tag in self._tag_extractor.finditer(content):

            if tag.start(8) > 0:
                # a html comment, ignore
                continue

            elif tag.start(7) > 0:
                # a meta tag
                curi = self._process_meta(curi, parsed_url, content,
                        (tag.start(5), tag.end(5)))

            elif tag.start(5) > 0:
                # generic <whatever tag
                curi = self._process_generic_tag(curi, parsed_url, content,
                        (tag.start(6), tag.end(6)),
                        (tag.start(5), tag.end(5)))

            elif tag.start(1) > 0:
                # <script> tag
                # TODO no script handling so far
                pass

            elif tag.start(3) > 0:
                # <style> tag
                # TODO no tag handling so far
                pass

        return curi