Ejemplos de CountVectorizer.fit en Python

Lenguaje de programación: Python

Namespace/Package Name: scikits.learn.feature_extraction.text

Clase / Tipo: CountVectorizer

Método / Función: fit

Ejemplos en hotexamples.com: 3

Python CountVectorizer.fit - 3 ejemplos encontrados. Estos son los ejemplos en Python del mundo real mejor valorados de scikits.learn.feature_extraction.text.CountVectorizer.fit extraídos de proyectos de código abierto. Puedes valorar ejemplos para ayudarnos a mejorar la calidad de los ejemplos.

Métodos usados con frecuencia

Mostrar Ocultar

CountVectorizer(8)

transform(2)

__init__(1)

fit(1)

fit_transform(1)

max_df(1)

Ejemplo n.º 1

Mostrar archivo

Archivo: test_text.py Proyecto: jolos/scikit-learn

def test_countvectorizer_custom_vocabulary():
    what_we_like = ["pizza", "beer"]
    vect = CountVectorizer(vocabulary=what_we_like)
    vect.fit(JUNK_FOOD_DOCS)
    assert_equal(set(vect.vocabulary), set(what_we_like))
    X = vect.transform(JUNK_FOOD_DOCS)
    assert_equal(X.shape[1], len(what_we_like))

Ejemplo n.º 2

Mostrar archivo

def test_vectorizer_max_df():
    test_data = [u'abc', u'dea']  # the letter a occurs in all strings
    vect = CountVectorizer(CharNGramAnalyzer(min_n=1, max_n=1), max_df=1.0)
    vect.fit(test_data)
    assert u'a' in vect.vocabulary.keys()
    assert_equals(len(vect.vocabulary.keys()), 5)
    vect.max_df = 0.5
    vect.fit(test_data)
    assert u'a' not in vect.vocabulary.keys()  # 'a' is ignored
    assert_equals(len(vect.vocabulary.keys()), 4)  # the others remain

Ejemplo n.º 3

Mostrar archivo

Archivo: test_text.py Proyecto: aayushsaxena15/projects

def test_vectorizer_max_df():
    test_data = [u'abc', u'dea']  # the letter a occurs in all strings
    vect = CountVectorizer(CharNGramAnalyzer(min_n=1, max_n=1), max_df=1.0)
    vect.fit(test_data)
    assert u'a' in vect.vocabulary.keys()
    assert_equals(len(vect.vocabulary.keys()), 5)
    vect.max_df = 0.5
    vect.fit(test_data)
    assert u'a' not in vect.vocabulary.keys()  # 'a' is ignored
    assert_equals(len(vect.vocabulary.keys()), 4)  # the others remain