Python PandasTfidfVectorizer 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pandas_transformers.transformers

클래스/타입: PandasTfidfVectorizer

hotexamples.com에서의 예제들: 8

Python PandasTfidfVectorizer - 8개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pandas_transformers.transformers.PandasTfidfVectorizer에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

PandasTfidfVectorizer(8)

fit(6)

transform(4)

자주 사용되는 메소드들

PandasTfidfVectorizer (8)

fit (6)

예제 #1

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_fit_with_df_input_without_column_arg(self, example_train_df):
        """
        In case we give no column argument to the initalizer, the input during fit
        should be a pd.Series. Otherwise raise TypeError.

        """
        transformer = PandasTfidfVectorizer()
        with pytest.raises(TypeError):
            transformer.fit(example_train_df)

예제 #2

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_missing_values_fit(self, example_missing_values_df):
        """
        Tests the case where there are missing values in the training data.
        Should return a ValueError.
        """

        transformer = PandasTfidfVectorizer(column="text")
        with pytest.raises(ValueError):
            transformer.fit(example_missing_values_df)

예제 #3

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_fit_with_series_input_with_column_arg(self, example_series):
        """
        In case we do  give a value for the column keyword argument, the input
        should be a pd.DataFrame.
        Otherwise, return a TypeError.
        """

        transformer = PandasTfidfVectorizer(column="text")
        with pytest.raises(TypeError):
            transformer.fit(example_series)

예제 #4

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_example(self, example_train_df):
        """ Tests a simple example. """
        transformer = PandasTfidfVectorizer(column="text")
        transformer.fit(example_train_df)
        transformed = transformer.transform(example_train_df)

        expected = pd.DataFrame({
            "num": pd.Series([3, 4, 4]),
            "animal": pd.Series([0.0, 1.0, 0.0]),
            "house": pd.Series([1.0, 0.0, 1.0]),
        })
        # The column order shouldnt matter (therefore we sort them)
        pd.testing.assert_frame_equal(transformed.sort_index(axis=1),
                                      expected.sort_index(axis=1))

예제 #5

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_series_input(self, example_series):
        """
        In case we don't give a value for the column keyword argument, the input
        should be a pandas series or np.ndarray.
        Otherwise, return a TypeError.
        """

        transformer = PandasTfidfVectorizer()
        transformer.fit(example_series)
        transformed = transformer.transform(example_series)

        expected = pd.DataFrame({
            "animal": pd.Series([0.0, 1.0, 0.0]),
            "house": pd.Series([1.0, 0.0, 1.0]),
        })

        pd.testing.assert_frame_equal(transformed.sort_index(axis=1),
                                      expected.sort_index(axis=1))

예제 #6

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_clone(self):
        """
        Test clone

        """
        transformer = PandasTfidfVectorizer(column="test", max_features=123)
        cloned = clone(transformer)

        assert transformer.column == cloned.column
        assert transformer.max_features == cloned.max_features

예제 #7

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_grid_search(self, example_train_df_binary):
        """Tests for grid search compatibility."""

        pipe = Pipeline([("tfidf", PandasTfidfVectorizer()),
                         ("model", LogisticRegression())])
        param_grid = {
            "tfidf__max_features": [5, 15],
        }

        X = example_train_df_binary["text"]
        y = example_train_df_binary["y"]

        search = GridSearchCV(pipe, param_grid)
        search.fit(X, y)

예제 #8

0

파일 보기

파일: test_transformers.py 프로젝트: NedimBayrakdar/pandas-transformers

    def test_missing_column(self, example_train_df,
                            example_test_df_diff_column):
        """
        Test transformer when test set does not have the required columns.
        In that case, it should return a KeyError
        """
        transformer = PandasTfidfVectorizer(column="text")
        transformer.fit(example_train_df)

        with pytest.raises(KeyError):
            transformer.transform(example_test_df_diff_column)