Python cut_repeat 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pytypo

메소드/함수: cut_repeat

hotexamples.com에서의 예제들: 2

Python cut_repeat - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pytypo.cut_repeat에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: Twitter_Sentiment_Analysis.py 프로젝트: NevineMGouda/Twitter-Sentiment-Analysis

def pre_process(rdd):
    # Tokenize each line
    # rdd = rdd.map(lambda (sentiment, tweet): (sentiment, word_tokenize(tweet)))
    rdd = rdd.map(lambda (sentiment, tweet):
                  (sentiment, tweet.strip().split(" ")))

    # Lowercase each word in tweet and return and rdd equivlant to (0,['this', 'is', 'a', 'lowercased','tweet'])
    rdd = rdd.map(lambda (sentiment, tweet):
                  (sentiment, lower_case(tweet=tweet)))

    # Remove punctuation from a tweet.
    # Example :(0,["is,","so","sad","for","my","APL","friend","............."]) should be mapped to
    #          (0,["is","so","sad","for","my","APL","friend"])
    rdd = rdd.map(lambda (sentiment, tweet):
                  (sentiment, remove_punc_keep_emoj(tweet=tweet)))
    # Stem words to their original. For example: "missing" or "missed" -> "miss"
    rdd = rdd.map(lambda (sentiment, tweet):
                  (sentiment, [stem_words(word=word) for word in tweet]))
    # Remove stop words such as: a, I, and, all, once, etc.
    rdd = rdd.map(lambda (sentiment, tweet): (
        sentiment, [word for word in tweet if word not in STOPWORDS]))

    # Map elongated words with a shorter version, with only 3 letters of the repeated words
    # Example: cooooollllll is mapped to cooolll
    rdd = rdd.map(lambda (sentiment, tweet):
                  (sentiment, [pytypo.cut_repeat(word, 3) for word in tweet]))
    return rdd

예제 #2

파일 보기

파일: test_pytypo.py 프로젝트: safiqul2212/pytypo

def test_cut_repeat():
    assert_equal(pytypo.cut_repeat('pytypooooooo', 1), 'pytypo')
    assert_equal(pytypo.cut_repeat('beeeeer', 2), 'beer')