Python TfidfVectorizer.firの例

プログラミング言語: Python

名前空間/パッケージ名: sklearn.feature_extraction.text

クラス/型: TfidfVectorizer

メソッド/関数: fir

hotexamples.comのコード掲載数: 2

Python TfidfVectorizer.fir - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのsklearn.feature_extraction.text.TfidfVectorizer.firの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

fit(30)

get_stop_words(30)

TfidfVectorizer(30)

fit_transform(30)

get_feature_names(30)

inverse_transform(30)

build_analyzer(30)

build_tokenizer(29)

get_params(29)

get_feature_names_out(14)

__init__(12)

idf_(11)

build_preprocessor(8)

max_features(8)

_validate_vocabulary(3)

max_df(3)

fir(2)

N_(2)

fit_on_texts(2)

build_vocab(2)

decode(2)

_tfidf(2)

decode_error(1)

append(1)

_document_frequency(1)

_get_param_names(1)

kneighbors(1)

join(1)

_stop_words_id(1)

inv_vocabulary_(1)

input(1)

infer_vector(1)

idx_target_cache(1)

get_word_net_feature_vecs(1)

bert(1)

get_shape(1)

encode(1)

get_feautre_names(1)

cate_set(1)

get_feature_name(1)

fit_transfrorm(1)

fit_transfrom(1)

count(1)

fit_trainsform(1)

count_args(1)

count_chunks(1)

encoding(1)

mean(1)

コード例 #1

ファイルを表示

    csvfile.close()
testdata = np.array(testdata, dtype=int)

if doc:
    vectorizer = TfidfVectorizer(max_df=max_df,
                                 min_df=min_df,
                                 max_features=max_features,
                                 stop_words='english')
    train = corpus + data
    vectorizer = vectorizer.fit(train)
else:
    vectorizer = TfidfVectorizer(max_df=max_df,
                                 min_df=min_df,
                                 max_features=max_features,
                                 stop_words='english')
    vectorizer = vectorizer.fir(data)
X = vectorizer.transform(data)

print("done in %fs" % (time() - t0))
print("n_samples: %d, n_features: %d" % X.shape)
print()

if True:
    print("Performing dimensionality reduction using LSA")
    t0 = time()
    # Vectorizer results are normalized, which makes KMeans behave as
    # spherical k-means for better results. Since LSA/SVD results are
    # not normalized, we have to redo the normalization.
    svd = TruncatedSVD(n_components)
    normalizer = Normalizer(copy=False)
    lsa = make_pipeline(svd, normalizer)

コード例 #2

ファイルを表示

ファイル: vectorize_text.py プロジェクト: Yue-Cui/Sklearn_Algos_and_Enron_Email

            for keyword in stopwords:
                text = text.replace(keyword, "")
            ### append the text to word_data
            word_data.append(text)
            ### append a 0 to from_data if email is from Sara, and 1 if email is from Chris
            if name=="sara"":
                from_data.append(0)
            else:
                from_data.append(1)
            email.close()

print "emails processed"
from_sara.close()
from_chris.close()

pickle.dump( word_data, open("your_word_data.pkl", "w") )
pickle.dump( from_data, open("your_email_authors.pkl", "w") )





### in Part 4, do TfIdf vectorization here
#the given stopword is "english"
vectorizer=TV(stop_words="english")
vectorizer.fir(word_data)
vectorizer.transform(word_data)
feature_words=vectorizer.get_feature_names()
#print out info
print "total number of words: ", len(feature_words)