Python CountVectorizer.fit_Transform Examples

Programming Language: Python

Namespace/Package Name: sklearn.feature_extraction.text

Class/Type: CountVectorizer

Method/Function: fit_Transform

Examples at hotexamples.com: 2

The CountVectorizer module in python's scikit-learn library is used for text processing and feature extraction. Given a text corpus, it produces a sparse matrix of word count frequencies.

Here's an example of using CountVectorizer:

from sklearn.feature_extraction.text import CountVectorizer
corpus = ['This is the first document.', 
          'This document is the second document.', 
          'And this is the third one.', 
          'Is this the first document?']
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(X.toarray())
print(vectorizer.get_feature_names())

The above code outputs the frequency of each word in the given corpus along with a list of feature names.

Another example of using CountVectorizer is to determine the sentiment of a collection of text reviews.

from sklearn.feature_extraction.text import CountVectorizer
corpus = ['I loved the movie!', 
          'The acting was terrible.', 
          'The plot was confusing.', 
          'Great movie, would recommend.']
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)

In this example, the CountVectorizer is used to convert the text reviews into a frequency matrix, which can then be used as input for a sentiment analysis algorithm. The package library used for the above examples is scikit-learn (sklearn).

Python CountVectorizer.fit_Transform - 2 examples found. These are the top rated real world Python examples of sklearn.feature_extraction.text.CountVectorizer.fit_Transform extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

CountVectorizer(30)

_validate_vocabulary(30)

fit_transform(30)

fit(30)

build_tokenizer(30)

build_analyzer(30)

get_stop_words(30)

get_params(21)

get_feature_names_out(15)

build_preprocessor(13)

__init__(10)

get_feature_names(9)

dictionary_freeze(6)

count(4)

analyzer(4)

fixed_vocabulary(3)

astype(3)

_count_vocab(2)

copy(2)

fit_trainsform(2)

get_features_names(2)

append(2)

_word_ngrams(2)

get_feature_name(1)

getSenVec(1)

_sort_features(1)

get_features(1)

get_sentence_vector(1)

get_shape(1)

getOutputCol(1)

fit_Transform(1)

fit_trasform(1)

fit_transfrom(1)

fit_transforn(1)

__repr__(1)

fir_transform(1)

__dict__(1)

extract_ngrams(1)

delete_temporary_training_data(1)

count_features(1)

_limit_features(1)

fir(1)

Example #1

Show file

File: class16.py Project: vyang91/DAT8

y_pred_prob = logreg.predict_proba(X_test)[:, 1]

from sklearn import metrics

metrics.accuracy_score(y_test, y_pred_class)
metrics.confusion_matrix(y_test, y_pred_class)

logreg.fit(X, y)
X_oos = test[feature_cols]
oos_pred_prob = logreg.predict_proba(X_oos)[:, 1]

###
submit = pd.DataFrame({"id": test.index, "OpenStatus": oos_pred_prob}).set_index("id")
submit.to_csv("sub2.csv")
###

from sklearn.feature_extraction.text import CountVectorizer

vect = CountVectorizer()
dtm = vect.fit_transform(train.Title)

X = dtm
y = train.OpenStatus

from sklearn.naive_bayes import MultinomialNB

nb = MultinomialNB()

vect = CountVectorizer(stop_words="english")
dtm = vect.fit_Transform(train.Title)

Example #2

Show file

File: class16.py Project: victoryang1/DAT8

y_pred_class = logreg.predict(X_test)
y_pred_prob = logreg.predict_proba(X_test)[:, 1]

from sklearn import metrics
metrics.accuracy_score(y_test, y_pred_class)
metrics.confusion_matrix(y_test, y_pred_class)

logreg.fit(X, y)
X_oos = test[feature_cols]
oos_pred_prob = logreg.predict_proba(X_oos)[:, 1]

###
submit = pd.DataFrame({'id':test.index, 'OpenStatus':oos_pred_prob}).set_index('id')
submit.to_csv('sub2.csv')
###

from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()
dtm = vect.fit_transform(train.Title)

X = dtm
y = train.OpenStatus

from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()

vect = CountVectorizer(stop_words='english')
dtm = vect.fit_Transform(train.Title)