Python DataFrame.selectExpr Examples

Programming Language: Python

Namespace/Package Name: pyspark.sql.dataframe

Class/Type: DataFrame

Method/Function: selectExpr

Examples at hotexamples.com: 3

Python DataFrame.selectExpr - 3 examples found. These are the top rated real world Python examples of pyspark.sql.dataframe.DataFrame.selectExpr extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

withColumn(30)

select(30)

DataFrame(24)

filter(20)

createOrReplaceTempView(16)

count(11)

drop(11)

_schema(10)

join(6)

collect(6)

show(5)

groupBy(5)

withColumnRenamed(5)

coalesce(5)

where(4)

toPandas(4)

cache(3)

repartition(3)

limit(3)

alias(3)

selectExpr(3)

unpersist(2)

toDF(2)

fillna(2)

schema(2)

printSchema(2)

persist(2)

_h2o_frame(2)

head(2)

explain(2)

foreach(1)

sortWithinPartitions(1)

take(1)

orderBy(1)

toLocalIterator(1)

transform(1)

agg(1)

mapInPandas(1)

Example #1

Show file

File: interpol.py Project: databrickslabs/tempo

    def __calc_linear_spark(self, df: DataFrame, ts_col: str, target_col: str):
        """
        Native Spark function for calculating linear interpolation on a DataFrame.

        :param df: prepared dataframe to be interpolated
        :param ts_col: timeseries column name
        :param target_col: column to be interpolated
        """
        interpolation_expr = f"""
        case when is_interpolated_{target_col} = false then {target_col}
            when {target_col} is null then 
            (next_null_{target_col} - previous_{target_col})
            /(unix_timestamp(next_timestamp_{target_col})-unix_timestamp(previous_timestamp_{target_col}))
            *(unix_timestamp({ts_col}) - unix_timestamp(previous_timestamp_{target_col}))
            + previous_{target_col}
        else 
            (next_{target_col}-{target_col})
            /(unix_timestamp(next_timestamp)-unix_timestamp(previous_timestamp))
            *(unix_timestamp({ts_col}) - unix_timestamp(previous_timestamp)) 
            + {target_col}
        end as {target_col}
        """

        # remove target column to avoid duplication during interpolation expression
        cols: List[str] = df.columns
        cols.remove(target_col)
        interpolated: DataFrame = df.selectExpr(*cols, interpolation_expr)
        # Preserve column order
        return interpolated.select(*df.columns)

Example #2

Show file

def convert_types_for_ml(df: DataFrame) -> DataFrame:
    return df.selectExpr("CAST(value AS STRING)") \
           .select(from_json("value", schema=schema).alias("data")) \
           .select("data.*")

Example #3

Show file

def convert_types_elastic_for_ml(df: DataFrame) -> DataFrame:
    df = df.selectExpr("CAST(value AS STRING)") \
           .select(from_json("value", schema=schema_elastic).alias("data")) \
           .select("data.*")
    return df.withColumn("radiant_win_int", df.radiant_win.cast(IntegerType()))