Python TfidfTransformer.fit_tansform Examples

Programming Language: Python

Namespace/Package Name: sklearn.feature_extraction.text

Class/Type: TfidfTransformer

Method/Function: fit_tansform

Examples at hotexamples.com: 1

Python TfidfTransformer.fit_tansform - 1 examples found. These are the top rated real world Python examples of sklearn.feature_extraction.text.TfidfTransformer.fit_tansform extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

TfidfTransformer(30)

fit(30)

fit_transform(30)

todense(12)

transform(8)

toarray(7)

_idf_diag(6)

get_feature_names(6)

get_params(4)

idf_(3)

astype(2)

_get_param_names(2)

set_params(2)

tocsc(2)

tocoo(2)

tolist(1)

tolil(1)

tocsr(1)

stop_words_(1)

getrow(1)

nonzero(1)

mean(1)

max(1)

__dict__(1)

get_shape(1)

fit_transformer(1)

fit_tansform(1)

eliminate_zeros(1)

build_analyzer(1)

__init__(1)

transpose(1)

Example #1

Show file

File: Demo2.py Project: zhenyusu/graduation_project

import jieba
import jieba.posseg as pseg
import os
import sys
from sklearn import feature_extraction
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer

if __name__ == "__main__":
    corpus = [
        "我 来到 北京 清华大学",  #第一类文本切词后的结果，词之间以空格隔开  
        "他 来到 了 网易 杭研 大厦",  #第二类文本的切词结果  
        "小明 硕士 毕业 与 中国 科学院",  #第三类文本的切词结果  
        "我 爱 北京 天安门"
    ]  #第四类文本的切词结果
    vectorizer = CountVectorizer(
    )  #该类会将文本中的词语转换为词频矩阵，矩阵元素a[i][j] 表示j词在i类文本下的词频
    transformer = TfidfTransformer()  #该类会统计每个词语的tf-idf权值
    tfidf = transformer.fit_tansform(vectorizer.fit_transform(
        corpus))  #第一个fit_transform是计算tf-idf，第二个fit_transform是将文本转为词频矩阵
    word = vectorizer.get_feature_names()  #获取词袋模型中的所有词语
    weight = tfidf.toarray()  #将tf-idfdf矩阵抽取出来，元素a[i][j]表示j词在i类文本中的tf-idf权重
    for i in range(
            len(weight)):  #打印每类文本的tf-idf词语权重，第一个for遍历所有文本，第二个for便利某一类文本下的词语权重
        print(u"-------这里输出第", i, u"类文本的词语tf-idf权重------")
        for j in range(len(word)):
            print(word[j], weight[i][j])