Python TfidfVectorizer._document_frequency示例

编程语言: Python

命名空间/包名称: sklearn.feature_extraction.text

类/类型: TfidfVectorizer

方法/功能: _document_frequency

hotexamples.com的示例: 1

Python TfidfVectorizer._document_frequency - 已找到1个示例。这些是从开源项目中提取的最受好评的sklearn.feature_extraction.text.TfidfVectorizer._document_frequency现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

fit(30)

get_stop_words(30)

TfidfVectorizer(30)

fit_transform(30)

get_feature_names(30)

inverse_transform(30)

build_analyzer(30)

build_tokenizer(29)

get_params(29)

get_feature_names_out(14)

__init__(12)

idf_(11)

build_preprocessor(8)

max_features(8)

_validate_vocabulary(3)

max_df(3)

fir(2)

N_(2)

fit_on_texts(2)

build_vocab(2)

decode(2)

_tfidf(2)

decode_error(1)

append(1)

_document_frequency(1)

_get_param_names(1)

kneighbors(1)

join(1)

_stop_words_id(1)

inv_vocabulary_(1)

input(1)

infer_vector(1)

idx_target_cache(1)

get_word_net_feature_vecs(1)

bert(1)

get_shape(1)

encode(1)

get_feautre_names(1)

cate_set(1)

get_feature_name(1)

fit_transfrorm(1)

fit_transfrom(1)

count(1)

fit_trainsform(1)

count_args(1)

count_chunks(1)

encoding(1)

mean(1)

示例#1

显示文件

                             strip_accents='unicode',
                             norm='l2',
                             sublinear_tf=True)
tfRawMatrix = vectorizer.fit_transform(lines[0:2000])
tfRawMatrix
print(tfRawMatrix)
print("Data dimensions: {}".format(tfRawMatrix.shape))
vectorizer.get_feature_names()
tfdtm = tfRawMatrix.toarray()
#convert the dtm to numpy array
tfdtm = np.array(tfdtm)
print(tfdtm)
tfVocab = np.array(vectorizer.get_feature_names())
print(tfVocab[79])

vectorizer._document_frequency()

tfdtm[1, 204]

#?how come the tfIdf score for the idex 79 i.e. flatline =1
'''Performign the count vectorization which is same as finding the bag of words'''
from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer(min_df=0.006,
                             stop_words=stopwordList,
                             strip_accents='unicode',
                             binary=False)
rawdtm = count_vect.fit_transform(lines[0:2000])
vocab = count_vect.get_feature_names()
#convert the dtm to regular array
dtm = rawdtm.toarray()