Python sql.dataframe.DataFrame.createOrReplaceTempView 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pyspark

메소드/함수: createOrReplaceTempView

hotexamples.com에서의 예제들: 2

Python sql.dataframe.DataFrame.createOrReplaceTempView - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pyspark.sql.dataframe.DataFrame.createOrReplaceTempView에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

withColumn(10)

select(9)

join(8)

filter(6)

count(5)

withColumnRenamed(4)

toPandas(3)

agg(2)

groupBy(2)

createOrReplaceTempView(2)

groupby(2)

cache(1)

limit(1)

query(1)

repartition(1)

fillna(1)

drop(1)

예제 #1

파일 보기

파일: OOS_helperfun.py 프로젝트: YiranJing/Lagardere_CommercialAnalysis

def clean_and_add_date(
    df: pyspark.sql.dataframe.DataFrame, date_generated: list,
    spark: pyspark.sql.session.SparkSession
) -> pyspark.sql.dataframe.DataFrame:
    """
    Add more rows to ensure each item in each store has the full-month records 
       (since if both stock and sales are 0, the raw date can miss the relevant column)
    """
    # Create a list of dates, start from the first day of dataset, end with the last day of dataset
    date_df = spark.createDataFrame(date_generated,
                                    DateType())  # create a Date df
    date_df = date_df.withColumnRenamed("value", "Date")

    # Register the DataFrame as a SQL temporary view
    df.createOrReplaceTempView("dfView")

    # get temporary table with distinct combinnation of SKU and Store
    ##sqlDF = spark.sql("SELECT SKU, Store FROM dfView GROUP BY SKU, Store") # same
    sqlDF = spark.sql("SELECT DISTINCT SKU, Store FROM dfView")

    # Cross join two dataset to create full schema
    schema = sqlDF.crossJoin(date_df)  # using crossjoin to quickly add
    #assert schema.count() == sqlDF.count() * len(date_generated) # check cross join result
    #assert schema.count() >= df.count(), 'We want ' + str(df.count()) + \
    #'row. But we get '+str(schema.count()) # we need add rows

    # left join origial dataset with new schema
    df = df.join(schema, on=['Date', 'Store', 'SKU'], how='right')
    #assert df.count() == count # test on overall dataset
    return df

예제 #2

파일 보기

파일: Rank_analysis_helperfunction.py 프로젝트: YiranJing/Lagardere_CommercialAnalysis

def get_store_item_concept_list(df: pyspark.sql.dataframe.DataFrame,
                                spark) -> list:
    """
    Get the list of combinations of SKU, concept in stores
    """
    # Register the DataFrame as a SQL temporary view
    df.createOrReplaceTempView("dfView")
    # Query and create new dataframe
    sqlDF = spark.sql("SELECT DISTINCT SKU, Store, Concept_NEW FROM dfView")
    store_item_list = sqlDF.rdd.map(tuple).collect()
    return store_item_list