Python CountVectorizer.getOutputCol Examples

Programming Language: Python

Namespace/Package Name: sklearn.feature_extraction.text

Class/Type: CountVectorizer

Method/Function: getOutputCol

Examples at hotexamples.com: 1

Python CountVectorizer.getOutputCol - 1 examples found. These are the top rated real world Python examples of sklearn.feature_extraction.text.CountVectorizer.getOutputCol extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

CountVectorizer(30)

_validate_vocabulary(30)

fit_transform(30)

fit(30)

build_tokenizer(30)

build_analyzer(30)

get_stop_words(30)

get_params(21)

get_feature_names_out(15)

build_preprocessor(13)

__init__(10)

get_feature_names(9)

dictionary_freeze(6)

count(4)

analyzer(4)

fixed_vocabulary(3)

astype(3)

_count_vocab(2)

copy(2)

fit_trainsform(2)

get_features_names(2)

append(2)

_word_ngrams(2)

get_feature_name(1)

getSenVec(1)

_sort_features(1)

get_features(1)

get_sentence_vector(1)

get_shape(1)

getOutputCol(1)

fit_Transform(1)

fit_trasform(1)

fit_transfrom(1)

fit_transforn(1)

__repr__(1)

fir_transform(1)

__dict__(1)

extract_ngrams(1)

delete_temporary_training_data(1)

count_features(1)

_limit_features(1)

fir(1)

Example #1

Show file

File: train.py Project: mpushkareva/ozon-masters-bigdata

from pyspark.ml import PipelineModel
from sklearn_wrapper import SklearnEstimatorModel

stop_words = StopWordsRemover.loadDefaultStopWords("english")
tokenizer = RegexTokenizer(inputCol="reviewText",
                           outputCol="wordsReview",
                           pattern="\\W")
swr = StopWordsRemover(inputCol=tokenizer.getOutputCol(),
                       outputCol="reviewFiltered",
                       stopWords=stop_words)
count_vectorizer = CountVectorizer(inputCol=swr.getOutputCol(),
                                   outputCol="reviewVector",
                                   binary=True,
                                   vocabSize=20)
assembler = VectorAssembler(
    inputCols=[count_vectorizer.getOutputCol(), 'verified'],
    outputCol="features")


@F.udf(ArrayType(DoubleType()))
def vectorToArray(row):
    return row.toArray().tolist()


@F.pandas_udf(DoubleType())
def predict(series):
    predictions = est_broadcast.value.predict(series.tolist())
    return pd.Series(predictions)


class HasSklearnModel(Params):