Python IDF.randomSplit示例

编程语言: Python

命名空间/包名称: pyspark.ml.feature

类/类型: IDF

方法/功能: randomSplit

hotexamples.com的示例: 1

Python IDF.randomSplit - 已找到1个示例。这些是从开源项目中提取的最受好评的pyspark.ml.feature.IDF.randomSplit现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

IDF(30)

fit(30)

getOutputCol(12)

transform(11)

select(2)

show(2)

randomSplit(1)

save(1)

示例#1

显示文件

文件： Ex2d.3.py 项目： wel51x/Machine_Learning_and_Spark

wrangled = StopWordsRemover(inputCol="words",
                            outputCol="terms").transform(wrangled)

# Apply the hashing trick
wrangled = HashingTF(inputCol="terms", outputCol="hash",
                     numFeatures=1024).transform(wrangled)

# Convert hashed symbols to TF-IDF
sms = IDF(inputCol="hash",
          outputCol="features").fit(wrangled).transform(wrangled)

# View the first four records
sms.show(4, truncate=False)

# Split the data into training and testing sets
sms_train, sms_test = sms.randomSplit([0.8, 0.2], seed=13)

# Fit a Logistic Regression model to the training data
logistic = LogisticRegression(regParam=0.2).fit(sms_train)

# Make predictions on the testing data
prediction = logistic.transform(sms_test)

# Create a confusion matrix, comparing predictions to known labels
prediction.groupBy("label", 'prediction').count().show()

# Find weighted precision
multi_evaluator = MulticlassClassificationEvaluator()
accuracy = multi_evaluator.evaluate(prediction,
                                    {multi_evaluator.metricName: "accuracy"})
weighted_precision = multi_evaluator.evaluate(