Python xml_unescape 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: nltk.tokenize.util

메소드/함수: xml_unescape

hotexamples.com에서의 예제들: 3

Python xml_unescape - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 nltk.tokenize.util.xml_unescape에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: nist.py 프로젝트: Aman020/Big-Data-Analytics

 def lang_independent_sub(self, text):
     """Performs the language independent string substituitions. """
     # It's a strange order of regexes.
     # It'll be better to unescape after STRIP_EOL_HYPHEN
     # but let's keep it close to the original NIST implementation.
     regexp, substitution = self.STRIP_SKIP
     text = regexp.sub(substitution, text)
     text = xml_unescape(text)
     regexp, substitution = self.STRIP_EOL_HYPHEN
     text = regexp.sub(substitution, text)
     return text

예제 #2

파일 보기

파일: nist.py 프로젝트: alpaco42/ML_Spring_2018

 def lang_independent_sub(self, text):
     """Performs the language independent string substituitions. """
     # It's a strange order of regexes.
     # It'll be better to unescape after STRIP_EOL_HYPHEN
     # but let's keep it close to the original NIST implementation.
     regexp, substitution = self.STRIP_SKIP
     text = regexp.sub(substitution, text)
     text = xml_unescape(text)
     regexp, substitution = self.STRIP_EOL_HYPHEN
     text = regexp.sub(substitution, text)
     return text

예제 #3

파일 보기

파일: nist.py 프로젝트: CPHB-FKMP/book-extractor

    def international_tokenize(self, text, lowercase=False,
                               split_non_ascii=True,
                               return_str=False):
        text = text_type(text)
        # Different from the 'normal' tokenize(), STRIP_EOL_HYPHEN is applied
        # first before unescaping.
        regexp, substitution = self.STRIP_SKIP
        text = regexp.sub(substitution, text)
        regexp, substitution = self.STRIP_EOL_HYPHEN
        text = regexp.sub(substitution, text)
        text = xml_unescape(text)

        if lowercase:
            text = text.lower()

        for regexp, substitution in self.INTERNATIONAL_REGEXES:
            text = regexp.sub(substitution, text)

        # Make sure that there's only one space only between words.
        # Strip leading and trailing spaces.
        text = ' '.join(text.strip().split())
        return text if return_str else text.split()