Python Tokenizer.saveの例

プログラミング言語: Python

名前空間/パッケージ名: pyspark.ml.feature

クラス/型: Tokenizer

メソッド/関数: save

hotexamples.comのコード掲載数: 1

Python Tokenizer.save - 1件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのpyspark.ml.feature.Tokenizer.saveの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Tokenizer(30)

getOutputCol(30)

transform(30)

load(3)

show(3)

select(2)

randomSplit(1)

save(1)

setParams(1)

withColumn(1)

コード例 #1

ファイルを表示

    except:
        logger.error("Can't input dataset")

    # Join posts_df and tags_df together and prepare training dataset
    selected_tags_df = tags_df.filter(tags_df.Tag.isin(
        tags_set.value)).na.drop(how='any')
    tags_questions_df = selected_tags_df.join(posts_df, "Id")
    training_df = tags_questions_df.select(['Tag', 'Body',
                                            'Id']).na.drop(how='any')
    logger.debug("successfully get training_df")

    # tokenize post texts and get term frequency and inverted document frequency
    logger.debug("Start to generate TFIDF features")
    tokenizer = Tokenizer(inputCol="Body", outputCol="Words")
    tokenized_words = tokenizer.transform(training_df.na.drop(how='any'))
    tokenizer.save(tokenizer_file)
    hashing_TF = HashingTF(inputCol="Words",
                           outputCol="Features",
                           numFeatures=20000)  #, numFeatures=200
    hashing_TF.save(hashing_tf_file)
    TFfeatures = hashing_TF.transform(tokenized_words.na.drop(how='any'))

    idf = IDF(inputCol="Features", outputCol="IDF_features")
    idfModel = idf.fit(TFfeatures.na.drop())
    idfModel.save(idf_model_file)
    TFIDFfeatures = idfModel.transform(TFfeatures.na.drop(how='any'))
    logger.debug("Get TFIDF features successfully")

    # for feature in TFIDFfeatures.select("IDF_features", "Tag").take(3):
    # 	logger.info(feature) =