Python strip_tweet 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: stopwords

메소드/함수: strip_tweet

hotexamples.com에서의 예제들: 2

Python strip_tweet - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 stopwords.strip_tweet에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: keywords.py 프로젝트: mrdolittle/Solarsmith

def extract_keywords(sentence):
    '''Extracts hashtags and keywords from a tweet, stores them in a
    neat little list of tuples of keyword and a confidence factor of
    some sort (currently hard-coded to 1.0. But might change in
    future, or might not and just be really stupid).

    TODO: perhaps try to filter hashtags from the explicit_keywords
          and extract_keywords_grammar stuff in the same manner as we
          don't count names doubly.
    '''
    
    def concat(*a):
        return reduce(operator.add, a, [])

    stripped = strip_tweet(sentence)

    names    = set(get_names(sentence)) # made into sets to speed up the filtering below
    hashtags = get_hashtags(sentence)
                    
    return concat(filter(lambda (a,_1): a not in names,
                         map(lambda (a,b): (a.lower(), b),
                             concat(explicit_keywords(map(non_aggresive_stemmer, nltk.word_tokenize(stripped))),
                                    map(non_aggresive_stemmer, filter_keywords(extract_keywords_grammar(stripped),
                                                                               key = lambda a: a[0]))))),
                  map(lambda x: (x.lower(), 5.0), hashtags),
                  map(lambda x: (x.lower(), 1.6), names))

예제 #2

파일 보기

파일: features.py 프로젝트: mrdolittle/Solarsmith

def extract_features(text):
    sequence = nltk.pos_tag(nltk.word_tokenize(text))
    text = strip_tweet(text)
    grammar='''Adjective: {<RBR>*(<JJ>|<JJS>|<JJT>|<JJR>)+}
               RbVerb: {(<RB>*(<VBN>|<VB>|<VBP>|<VBG>))+}'''
    chunks = nltk.RegexpParser(grammar)
    feat = []
    #print chunks.parse(sequence)
    for t in chunks.parse(sequence).subtrees():
        if t.node == "Adjective":
            if len(t)>1:
                line = reduce(lambda x,y: x + " " + y, map(lambda (x,_1): x, t))
                feat.append(line)
            else:
                feat.append(t[0][0])  
        elif t.node == "RbVerb":
            if len(t)>1:
                line = reduce(lambda x,y: x + " " + y, map(lambda (x,_1): x, t))
                line = line.replace("n't","not")
                line = line.replace("'m", "am")
                feat.append(line)
            else:
                feat.append(t[0][0])
            
    return list(set(feat))