Ejemplos de TfidfVectorizer.decode en Python

Lenguaje de programación: Python

Namespace/Package Name: sklearn.feature_extraction.text

Clase / Tipo: TfidfVectorizer

Método / Función: decode

Ejemplos en hotexamples.com: 2

Python TfidfVectorizer.decode - 2 ejemplos encontrados. Estos son los ejemplos en Python del mundo real mejor valorados de sklearn.feature_extraction.text.TfidfVectorizer.decode extraídos de proyectos de código abierto. Puedes valorar ejemplos para ayudarnos a mejorar la calidad de los ejemplos.

Métodos usados con frecuencia

Mostrar Ocultar

fit(30)

get_stop_words(30)

TfidfVectorizer(30)

fit_transform(30)

get_feature_names(30)

inverse_transform(30)

build_analyzer(30)

build_tokenizer(29)

get_params(29)

get_feature_names_out(14)

__init__(12)

idf_(11)

build_preprocessor(8)

max_features(8)

_validate_vocabulary(3)

max_df(3)

fir(2)

N_(2)

fit_on_texts(2)

build_vocab(2)

decode(2)

_tfidf(2)

decode_error(1)

append(1)

_document_frequency(1)

_get_param_names(1)

kneighbors(1)

join(1)

_stop_words_id(1)

inv_vocabulary_(1)

input(1)

infer_vector(1)

idx_target_cache(1)

get_word_net_feature_vecs(1)

bert(1)

get_shape(1)

encode(1)

get_feautre_names(1)

cate_set(1)

get_feature_name(1)

fit_transfrorm(1)

fit_transfrom(1)

count(1)

fit_trainsform(1)

count_args(1)

count_chunks(1)

encoding(1)

mean(1)

Ejemplo n.º 1

Mostrar archivo

def tf_idf_features(train_data, test_data):
    # Bag-of-words representation
    tf_idf_vectorize = TfidfVectorizer(max_df=0.8,
                                       strip_accents='unicode',
                                       lowercase=True,
                                       ngram_range=(1, 1),
                                       norm='l2',
                                       stop_words='english')
    tf_idf_train = tf_idf_vectorize.fit_transform(
        train_data.data)  #bag-of-word features for training data
    feature_names = tf_idf_vectorize.get_feature_names(
    )  #converts feature index to the word it represents.
    tf_idf_test = tf_idf_vectorize.transform(test_data.data)
    vocab = tf_idf_vectorize.decode(feature_names)
    shape = tf_idf_train.shape
    print('{} train data points.'.format(shape[0]))
    print('{} feature dimension.'.format(shape[1]))
    print('Most common word in training set is "{}"'.format(
        feature_names[tf_idf_train.sum(axis=0).argmax()]))
    return tf_idf_train, tf_idf_test, feature_names

Ejemplo n.º 2

Mostrar archivo

Archivo: language_detector.py Proyecto: coreyhu/scikit-languageID

except OSError as ex:
    print(ex)
    print("Couldn't import the data, did you unzip the wikidata.zip folder?")
    exit(-1)

docs = dataset['data']
target = dataset['target']

docs_train, docs_test, y_train, y_test = train_test_split(docs,
                                                          target,
                                                          test_size=.2,
                                                          random_state=0)

vec = TfidfVectorizer(ngram_range=(1, 5), analyzer='char', use_idf=True)

vec.decode(docs_train)

mlp = MLPClassifier()

model = make_pipeline(vec, mlp)

model.fit(docs_train, y_train)

y_predicted = model.predict(docs_test)

target_names = [dataset.target_names[i] for i in np.unique(y_train)]
print(classification_report(y_test, y_predicted, target_names=target_names))

cm = confusion_matrix(y_test, y_predicted)
predicted_names = ['p_' + s for s in target_names]
dfcm = DataFrame(cm, columns=predicted_names, index=target_names)