Python HashingTF.cache 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pyspark.mllib.feature

클래스/타입: HashingTF

메소드/함수: cache

hotexamples.com에서의 예제들: 5

Python HashingTF.cache - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pyspark.mllib.feature.HashingTF.cache에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

HashingTF(30)

transform(30)

cache(5)

indexOf(4)

map(2)

예제 #1

파일 보기

 def _compute_tfid(texts: RDD) -> IDFModel:
     tf = HashingTF().transform(texts.map(lambda t: t.words))
     tf.cache()
     idf = IDF().fit(tf)
     tfidfs = idf.transform(tf)
     text_tfs = texts.zip(tfidfs)
     return text_tfs.map(lambda t: t[0].set_tfidf(t[1]))

예제 #2

파일 보기

파일: text_rdd.py 프로젝트: oliyura/UANLP

 def tfidf(self):
     tf = HashingTF().transform(self._sents)
     self._tf = tf
     tf.cache()
     idf = IDF().fit(tf)
     self.idf = idf
     tfidf = idf.transform(tf)
     self._tfidf = dict(enumerate(tfidf.collect()))

예제 #3

파일 보기

파일: meteos-script-1.6.0.py 프로젝트: ncarkaci/meteos

    def parseTextRDDToIndex(self, data, label=True):

        if label:
            labels = data.map(lambda line: float(line.split(" ", 1)[0]))
            documents = data.map(lambda line: line.split(" ", 1)[1].split(" "))
        else:
            documents = data.map(lambda line: line.split(" "))

        tf = HashingTF().transform(documents)
        tf.cache()

        idfIgnore = IDF(minDocFreq=2).fit(tf)
        index = idfIgnore.transform(tf)

        if label:
            return labels.zip(index).map(
                lambda line: LabeledPoint(line[0], line[1]))
        else:
            return index

예제 #4

파일 보기

파일: text_rdd.py 프로젝트: oliyura/UANLP

 def _compute_idf(texts: RDD) -> IDFModel:
     tf = HashingTF().transform(texts)
     tf.cache()
     idf = IDF().fit(tf)
     return idf

예제 #5

파일 보기

파일: tfidf_classify_spark.py 프로젝트: PB12203006/Largedata

training_raw = sc.parallelize(traindata)


labels = training_raw.map(
    lambda doc: doc["label"],  # Standard Python dict access
    preservesPartitioning=True # This is obsolete.
)


# While applying HashingTF only needs a single pass to the data, applying IDF needs two passes:
# First to compute the IDF vector and second to scale the term frequencies by IDF.
tf = HashingTF(numFeatures=numfeatures).transform( ## Use much larger number in practice
    training_raw.map(lambda doc: doc["text"].split(),
    preservesPartitioning=True))

tf.cache()
idf = IDF().fit(tf)
tfidf = idf.transform(tf)

# Combine using zip
training = labels.zip(tf).map(lambda x: LabeledPoint(x[0], x[1]))

# TEST DATA
testlabel = testlabels.map(lambda line: float(line))
t = reviewdata1.collect()
l = testlabel.collect()
testdata = [{"text":t[i],"label":l[i]} for i in range(len(l))]

test_raw = sc.parallelize(testdata)

testlabels = test_raw.map(