Python CountVectorizer.todenseの例

プログラミング言語: Python

名前空間/パッケージ名: cuml.feature_extraction.text

クラス/型: CountVectorizer

メソッド/関数: todense

hotexamples.comのコード掲載数: 7

Python CountVectorizer.todense - 7件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのcuml.feature_extraction.text.CountVectorizer.todenseの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

CountVectorizer(21)

get_feature_names(8)

todense(7)

fit(6)

fit_transform(5)

inverse_transform(1)

max_df(1)

min_df(1)

transform(1)

コード例 #1

0

ファイルを表示

ファイル: test_text_feature_extraction.py プロジェクト: st071300/cuML

def test_vectorizer_empty_token_case():
    """
    We ignore empty tokens right now but sklearn treats them as a character
    we might want to look into this more but
    this should not be a concern for most piplines
    """
    corpus = [
        "a b ",
    ]

    # we have extra null token here
    # we slightly diverge from sklearn here as not treating it as a token
    res = CountVectorizer(preprocessor=lambda s: s).\
        fit_transform(Series(corpus))
    ref = SkCountVect(
        preprocessor=lambda s: s, tokenizer=lambda s: s.split(" ")
    ).fit_transform(corpus)
    cp.testing.assert_array_equal(res.todense(), ref.toarray())

    res = HashingVectorizer(preprocessor=lambda s: s).\
        fit_transform(Series(corpus))
    ref = SkHashVect(
        preprocessor=lambda s: s, tokenizer=lambda s: s.split(" ")
    ).fit_transform(corpus)
    assert_almost_equal_hash_matrices(res.todense().get(), ref.toarray())

コード例 #2

0

ファイルを表示

ファイル: test_text_feature_extraction.py プロジェクト: st071300/cuML

def test_countvectorizer_custom_vocabulary():
    vocab = {"pizza": 0, "beer": 1}
    vocab_gpu = Series(vocab.keys())

    ref = SkCountVect(vocabulary=vocab).fit_transform(DOCS)
    X = CountVectorizer(vocabulary=vocab_gpu).fit_transform(DOCS_GPU)
    cp.testing.assert_array_equal(X.todense(), ref.toarray())

コード例 #3

0

ファイルを表示

ファイル: test_text_feature_extraction.py プロジェクト: st071300/cuML

def test_only_delimiters():
    data = ['abc def. 123',
            '   ',
            '456 789']
    data_gpu = Series(data)
    res = CountVectorizer().fit_transform(data_gpu)
    ref = SkCountVect().fit_transform(data)
    cp.testing.assert_array_equal(res.todense(), ref.toarray())

コード例 #4

0

ファイルを表示

ファイル: test_text_feature_extraction.py プロジェクト: st071300/cuML

def test_empty_doc_after_limit_features():
    data = ['abc abc def',
            'def abc',
            'ghi']
    data_gpu = Series(data)
    count = CountVectorizer(min_df=2).fit_transform(data_gpu)
    ref = SkCountVect(min_df=2).fit_transform(data)
    cp.testing.assert_array_equal(count.todense(), ref.toarray())

コード例 #5

0

ファイルを表示

ファイル: test_text_feature_extraction.py プロジェクト: st071300/cuML

def test_count_vectorizer():
    corpus = [
        'This is the first document.',
        'This document is the second document.',
        'And this is the third one.',
        'Is this the first document?',
    ]

    res = CountVectorizer().fit_transform(Series(corpus))
    ref = SkCountVect().fit_transform(corpus)
    cp.testing.assert_array_equal(res.todense(), ref.toarray())

コード例 #6

0

ファイルを表示

ファイル: test_text_feature_extraction.py プロジェクト: st071300/cuML

def test_countvectorizer_stop_words():
    ref = SkCountVect(stop_words='english').fit_transform(DOCS)
    X = CountVectorizer(stop_words='english').fit_transform(DOCS_GPU)
    cp.testing.assert_array_equal(X.todense(), ref.toarray())

コード例 #7

0

ファイルを表示

ファイル: test_text_feature_extraction.py プロジェクト: st071300/cuML

def test_countvectorizer_separate_fit_transform():
    res = CountVectorizer().fit(DOCS_GPU).transform(DOCS_GPU)
    ref = SkCountVect().fit(DOCS).transform(DOCS)
    cp.testing.assert_array_equal(res.todense(), ref.toarray())