Python SQLContext.withColumn Examples

Programming Language: Python

Namespace/Package Name: pyspark

Class/Type: SQLContext

Method/Function: withColumn

Examples at hotexamples.com: 2

Python SQLContext.withColumn - 2 examples found. These are the top rated real world Python examples of pyspark.SQLContext.withColumn extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

sql(30)

createDataFrame(30)

SQLContext(28)

getOrCreate(17)

setConf(14)

registerDataFrameAsTable(10)

load(4)

cacheTable(4)

jsonFile(3)

show(3)

parquetFile(3)

registerFunction(3)

withColumn(2)

dropTempTable(2)

tableNames(2)

clearCache(2)

range(2)

applySchema(2)

jsonRDD(2)

inferSchema(2)

groupby(1)

printSchema(1)

select(1)

persist(1)

filter(1)

Example #1

Show file

def load_sentence_data_frame(sc, dataPath):
    df = SQLContext(sc).read.format('com.databricks.spark.csv') \
        .options(header='true', inferschema='true') \
        .load(dataPath)

    # 複製欄位(vector)
    df = df.withColumn("_vector", df['vector'])

    # 去除_vector的 [ 以及 ]
    df = df.select(
        df['id'], df['sentence'], df['vector'],
        regexp_replace(df['_vector'], "[\]\[]", "").alias("_vector"))

    # 分割_vector字串並且轉型
    df = df.select(
        df['id'], df['sentence'], df['vector'],
        split(df['_vector'], "  ").cast("array<double>").alias("_vector"))

    # 將double轉換為vectory再轉換為numpy array
    tmp = df.rdd.flatMap(lambda x: {
        Row(x['id'], x['sentence'], x['vector'], Vectors.dense(x['_vector']))
    })

    # 再轉換為dataframe
    df = SQLContext(sc).createDataFrame(tmp)\
            .selectExpr("_1 as id",
                        "_2 as sentence",
                        "_3 as vector",
                        "_4 as _vector")

    # 回傳dataframe
    return df

Example #2

Show file

File: practice4.py Project: omkarkeshav/Pyspark

# In[13]:


df = df.withColumn(
    "date",
    F.when(
        F.to_date(F.col("date_str"),"yyyy-MM-dd").isNotNull(),
        F.to_date(F.col("date_str"),"yyyy-MM-dd"),
    ).otherwise(
        F.when(
            F.to_date(F.col("date_str"),"yyyy MM dd").isNotNull(),
            F.to_date(F.col("date_str"),"yyyy MM dd"),
        ).otherwise(
            F.when(
                F.to_date(F.col("date_str"),"yyyy MMM dd").isNotNull(),
                F.to_date(F.col("date_str"),"yyyy MMM dd"),
            ).otherwise(
                F.when(
                    F.to_date(F.col("date_str"),"E, dd MMMM yy").isNotNull(),
                    F.to_date(F.col("date_str"),"E, dd MMMM yy"),
                )
            ),
        ),
            
    ),
)


# In[14]: