Python remove_special_chars 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: ferret.cleaner.text

메소드/함수: remove_special_chars

hotexamples.com에서의 예제들: 6

Python remove_special_chars - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 ferret.cleaner.text.remove_special_chars에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

    def _get_anchor_ratio(self, tag):
        try:
            text_length = len(remove_special_chars(tag.text))
            anchors_length = sum(
                len(remove_special_chars(a.text)) for a in tag.find_all('a'))

            if anchors_length == 0:
                return 0
            return round(anchors_length / float(text_length), 4)
        except AttributeError:
            return 0

예제 #2

파일 보기

    def _get_punctuation_ratio(self, tag):
        try:
            tag_text = remove_special_chars(tag.text)

            words_count = len(tag_text.split())
            punct_count = sum(
                tag_text.count(symbol)
                for symbol in ['.', ',', '!', '?', ':', ';'])

            if words_count == 0:
                return 0
            return round(punct_count / float(words_count), 4)
        except AttributeError:
            return 0

예제 #3

파일 보기

 def extract(self):
     body = BeautifulSoup(self.context.get('html'), 'html5lib').body
     body = self._remove_unwanted_tags(body)
     body = self._remove_comments(body)
     body = self._convert_elements_to_paragraph(body)
     body = self._label_tags_with_scores(body)
     body = self._choose_by_density(body)
     body = self._remove_by_score(body)
     body = self._remove_noisy_tags(body)
     body = self._remove_redundant_blocks(body)
     body = self._remove_unwanted_tags(body)
     body = self._clean_scores(body)
     body = self._clean_up_attributes(body)
     body = self._remove_title_from_text(body, self.context.get('title'))
     body = self._fix_image_paths(body)
     return remove_special_chars(str(body))

예제 #4

파일 보기

def simple_clean(html):
    body = BeautifulSoup(html, 'lxml').body
    for elem in body.select('script,style,link,source'):
        elem.extract()
    return remove_special_chars(str(body))

예제 #5

파일 보기

def extract_body_text_from_html(html):
    body = BeautifulSoup(html, 'lxml').body
    for elem in body.select('script,style,link,source'):
        elem.extract()
    return remove_special_chars(str(body.get_text()))

예제 #6

파일 보기

def test_removal_of_special_characters(text, expected):
    actual = remove_special_chars(text)
    assert actual == expected