Python CountVectorizer.fit_trainsform 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: sklearn.feature_extraction.text

클래스/타입: CountVectorizer

메소드/함수: fit_trainsform

hotexamples.com에서의 예제들: 2

Python CountVectorizer.fit_trainsform - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 sklearn.feature_extraction.text.CountVectorizer.fit_trainsform에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

CountVectorizer(30)

_validate_vocabulary(30)

fit_transform(30)

fit(30)

build_tokenizer(30)

build_analyzer(30)

get_stop_words(30)

get_params(21)

get_feature_names_out(15)

build_preprocessor(13)

__init__(10)

get_feature_names(9)

dictionary_freeze(6)

count(4)

analyzer(4)

fixed_vocabulary(3)

astype(3)

_count_vocab(2)

copy(2)

fit_trainsform(2)

get_features_names(2)

append(2)

_word_ngrams(2)

get_feature_name(1)

getSenVec(1)

_sort_features(1)

get_features(1)

get_sentence_vector(1)

get_shape(1)

getOutputCol(1)

fit_Transform(1)

fit_trasform(1)

fit_transfrom(1)

fit_transforn(1)

__repr__(1)

fir_transform(1)

__dict__(1)

extract_ngrams(1)

delete_temporary_training_data(1)

count_features(1)

_limit_features(1)

fir(1)

예제 #1

파일 보기

파일: text mining.py 프로젝트: Hanbyeongrim/python

import re
regexp = re.compile('(?u)\\b\\w\\w+\\b')
en_nlp = spacy.load('en')
old_tokenizer = en_nlp.tokenizer
en_nlp.tokenizer = lambda string: old_tokenizer.tokens_from_list(
    regexp.findall(string))


def custom_tokenizer(document):  #커스텀 토큰 분할기
    doc_spacy = en_nlp(document, entity=False, parse=False)
    return [token.lemma_ for token in doc_spacy]


lemma_vect = CountVectorizer(tokenizer=custom_tokenizer, min_df=5)

X_train_lemma = lemma_vect.fit_trainsform(text_train)
print("X_train_lemma.shape : {}".format(X_train_lemma.shape))

vect = CountVectorizer(min_df=5).fit(text_train)
X_train = vect.transform(text_train)
print("X_train.shape : {}".format(X_train.shape))

#그리드서치
from sklearn.model_selection import StratifiedShuffleSplit
param_grid = {"C": [0.001, 0.01, 0.1, 1, 10]}
cv = StratifiedShuffleSplit(n_splits=5,
                            test_size=0.99,
                            train_size=0.01,
                            random_state=0)
grid = GridSearchCV(LogisticRegression(), param_grid, cv=cv)
grid.fit(X_train, y_train)

예제 #2

파일 보기

파일: naive_bayes.py 프로젝트: SeoultechDataMIning/viewReview

from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import precision_recall_curve
'''
pipe = Pipeline([('vect', CountVectorizer()),
                ('clf', MultinomialNB())])

'''
param_grid = {
    'vect__min_df': [1, 2, 3, 4, 5],
    'clf__alpha': [1, 0.1, 0.01, 0.001, 0.0001, 0.00001]
}
vectorizer = CountVectorizer(min_df=5)
Xtrain = vectorizer.fit_transform(X_train)
Xtest = vectorizer.fit_trainsform(X_test)

filename = "./model_save/naive_bayes_"
import datetime

now = datetime.datetime.now().strftime("%Y%m%d%H%M")
filename = './model_save/naive_bayes_' + now

import csv

f = open(filename + ".csv", "w")
csvWrite = csv.writer(f)
csvWrite.writerow(["min_dif", "alpha", "score", "recall", "precision"])

count = 1
for alpha in param_grid['clf__alpha']: