Python RFormula.transform示例

编程语言: Python

命名空间/包名称: pyspark.ml.feature

类/类型: RFormula

方法/功能: transform

hotexamples.com的示例: 2

Python RFormula.transform - 已找到2个示例。这些是从开源项目中提取的最受好评的pyspark.ml.feature.RFormula.transform现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

RFormula(30)

fit(26)

transform(2)

count(1)

getStringIndexerOrderType(1)

randomSplit(1)

setFormula(1)

setHandleInvalid(1)

show(1)

示例#1

显示文件

    # _import zoo data to a spark dataframe
    mushroom_df = spark.read.option("inferschema",
                                    "true").option("header",
                                                   "true").csv("mushrooms.csv")
    mushroom_df.show(5)
    mushroom_df.printSchema()

    mushroom_df = mushroom_df.na.drop()
    # _No need to create extra column as Lab column is already binary classifiable with either EDIBLE or POISONOUS values
    mushroom_df = mushroom_df.drop("VeilType")

    # _preprocess data
    pre_process_data = RFormula(formula="Lab ~ .")
    pre_process_data = pre_process_data.fit(mushroom_df)
    pre_process_data = pre_process_data.transform(mushroom_df)

    pre_process_data.show(5)

    # _split dataset into test and train datasets
    train, test = pre_process_data.randomSplit([0.7, 0.3])

    # _initialize logistic regression classifier
    lr = LogisticRegression(labelCol="label", featuresCol="features")

    # _train logistic regression model with train data available
    fittedLr = lr.fit(train)

    # _classify test data
    result = fittedLr.transform(test)
    result.show(5)

示例#2

显示文件

    zoo_df = spark.read.option("inferschema",
                               "true").option("header", "true").csv("zoo.csv")
    zoo_df.show(5)
    zoo_df.printSchema()

    # _add new column Is_Mammal
    zoo_df = zoo_df.withColumn("Is_Mammal",
                               expr("CASE WHEN Type = 1 THEN 1 ELSE 0 END"))

    # _preprocess data
    pre_process_data = RFormula(
        formula=
        "Is_Mammal ~ Hair + Feathers + Eggs + Milk + Airborne + Aquatic + Predator + Toothed + Backbone + Breathes + Venomous + Fins + Legs + Tail + Domestic + Catsize"
    )
    pre_process_data = pre_process_data.fit(zoo_df)
    pre_process_data = pre_process_data.transform(zoo_df)

    pre_process_data.show(5)

    # _split dataset into test and train datasets
    train, test = pre_process_data.randomSplit([0.7, 0.3])

    # _initialize logistic regression classifier
    lr = LogisticRegression(labelCol="label", featuresCol="features")

    # _train logistic regression model with train data available
    fittedLr = lr.fit(train)

    # _classify test data
    result = fittedLr.transform(test)
    result.show()