from sklearn.feature_extraction.text import TfidfTransformer import numpy as np # define the raw documents docs = ["The cat in the hat.", "The cat sat on the mat.", "The dog chased the cat."] # create a count matrix from the raw documents count_vectorizer = CountVectorizer() count_matrix = count_vectorizer.fit_transform(docs) # create a tf-idf transformer from the count matrix tfidf_transformer = TfidfTransformer() tfidf_matrix = tfidf_transformer.fit_transform(count_matrix) # check the shape of the tf-idf matrix np.shape(tfidf_matrix)
from sklearn.feature_extraction.text import TfidfTransformer # define the raw documents docs = ["The quick brown fox jumps over the lazy dog.", "The quick brown fox jumps over the lazy dog and the quick brown fox.", "The quick brown fox is quick and brown."] # create a tf-idf transformer from the raw documents tfidf_transformer = TfidfTransformer(use_idf=True) tfidf_matrix = tfidf_transformer.fit_transform(docs) # print the tf-idf matrix print(tfidf_matrix)In this example, we define a list of three raw documents. We then use the TfidfTransformer tool to create a tf-idf matrix, specifying that we want to use inverse document frequency (idf) weighting. We print the resulting tf-idf matrix to the console. The package library for the TfidfTransformer tool is sklearn, which is part of the larger SciPy ecosystem for Python.