Python replace_emails 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: textacy.preprocess

메소드/함수: replace_emails

hotexamples.com에서의 예제들: 6

Python replace_emails - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 textacy.preprocess.replace_emails에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: corpus_building.py 프로젝트: uk-gov-mirror/ukgovdatascience.govuk-lda-tagger-lite

def preprocess_unicode(raw_text):
    raw_text = preprocess.transliterate_unicode(raw_text.lower())
    raw_text = preprocess.replace_urls(raw_text, replace_with=u'')
    raw_text = preprocess.replace_emails(raw_text, replace_with=u'')
    raw_text = preprocess.replace_phone_numbers(raw_text, replace_with=u'')
    raw_text = preprocess.replace_numbers(raw_text, replace_with=u'')
    raw_text = preprocess.replace_currency_symbols(raw_text, replace_with=u'')
    return raw_text

예제 #2

파일 보기

    def clean_tweet(self, text):
        # FIXED UNICODE
        text = preprocess.fix_bad_unicode(text)

        # GET TEXT ONLY FROM HTML
        text = BeautifulSoup(text, features='lxml').getText()
        # UN-PACK CONTRACTIONS
        text = preprocess.unpack_contractions(text)

        # REMOVE URL
        text = preprocess.replace_urls(text)

        # REMOVE EMAILS
        text = preprocess.replace_emails(text)

        # REMOVE PHONE NUMBERS
        text = preprocess.replace_phone_numbers(text)

        # REMOVE NUMBERS
        text = preprocess.replace_numbers(text)

        # REMOVE CURRENCY
        text = preprocess.replace_currency_symbols(text)

        # REMOVE ACCENTS
        text = preprocess.remove_accents(text)

        # CONVERT EMOJIS TO TEXT
        words = text.split()
        reformed = [
            self.SMILEY[word] if word in self.SMILEY else word
            for word in words
        ]
        text = " ".join(reformed)
        text = emoji.demojize(text)
        text = text.replace(":", " ")
        text = ' '.join(text.split())

        # SPLIT ATTACHED WORDS
        text = ' '.join(re.findall('[A-Z][^A-Z]*', text))

        # SPLIT UNDERSCORE WORDS
        text = text.replace('_', ' ')

        # REMOVE PUNCTUATION
        text = preprocess.remove_punct(text)

        # Remove numbers
        text = re.sub(r'\d', '', text)

        # REMOVE WORDS LESS THAN 3 CHARACTERS
        text = re.sub(r'\b\w{1,2}\b', '', text)

        # NORMALIZE WHITESPACE
        text = preprocess.normalize_whitespace(text)

        return text

예제 #3

파일 보기

 def clean_text(self, raw_text):
     raw_text = self.strip_tags(raw_text)
     raw_text = raw_text.lower()
     raw_text = preprocess.remove_punct(raw_text)
     raw_text = preprocess.transliterate_unicode(raw_text)
     raw_text = preprocess.replace_urls(raw_text, replace_with='')
     raw_text = preprocess.replace_emails(raw_text, replace_with='')
     raw_text = preprocess.replace_phone_numbers(raw_text, replace_with='')
     raw_text = preprocess.replace_numbers(raw_text, replace_with='')
     raw_text = preprocess.replace_currency_symbols(raw_text,
                                                    replace_with='')
     return raw_text

예제 #4

파일 보기

def test_replace_emails():
    text = "I can be reached at [email protected] through next Friday."
    proc_text = "I can be reached at *EMAIL* through next Friday."
    assert preprocess.replace_emails(text, "*EMAIL*") == proc_text

예제 #5

파일 보기

파일: test_preprocess.py 프로젝트: winstonewert/textacy

 def test_replace_emails(self):
     text = "I can be reached at [email protected] through next Friday."
     proc_text = "I can be reached at *EMAIL* through next Friday."
     self.assertEqual(preprocess.replace_emails(text, '*EMAIL*'), proc_text)

예제 #6

파일 보기

파일: test_preprocess.py 프로젝트: EricSchles/textacy

 def test_replace_emails(self):
     text = "I can be reached at [email protected] through next Friday."
     proc_text = "I can be reached at *EMAIL* through next Friday."
     self.assertEqual(preprocess.replace_emails(text, '*EMAIL*'), proc_text)