Python Preprocessing_nlp.review_to_words 예제들

프로그래밍 언어: Python

클래스/타입: Preprocessing_nlp

메소드/함수: review_to_words

hotexamples.com에서의 예제들: 1

Python Preprocessing_nlp.review_to_words - 1개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 Preprocessing_nlp.review_to_words에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

review_to_wordlist(1)

review_to_words(1)

자주 사용되는 메소드들

review_to_wordlist (1)

review_to_words (1)

예제 #1

파일 보기

파일: Format1_clean.py 프로젝트: jp1989326/Machine_learning_for_reliability_analysis

df = pd.read_csv(data_path + 'labeledTrainData.tsv', header = 0, \
                    delimiter = '\t', quoting = 3)
num_docus = train['review'].size


#1. remove the HTML markup( like <br>), remove non-letters, convert to lower case, split into 
# words, remove stopwords, join words back into one string separated by space

import Preprocessing_nlp as pre
clean_docus = []
for i in xrange(0, num_docus):
    
    if ((i+1)%1000 == 0):
        print "review %d of %d\n" % (i+1, num_docus)
    
    clean_docus.append(pre.review_to_words(df['clean_url'][i], filter_words = 'timeline'))

# 1.2 further filtering more words (optional)


    
#####################################################
#2.1    create features from the bag of words
print 'creating the bag of words...\n'
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer( analyzer = 'word', \
                            tokenizer = None, \
                            preprocessor = None, \
                            stop_words = None, \
                            max_features = 5000)