Python Operations.create_dataframe Examples

Programming Language: Python

Namespace/Package Name: operations

Class/Type: Operations

Method/Function: create_dataframe

Examples at hotexamples.com: 2

Python Operations.create_dataframe - 2 examples found. These are the top rated real world Python examples of operations.Operations.create_dataframe extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Operations(20)

add(4)

clean_string_column(3)

accounts(3)

addition(3)

basic_function(2)

doConvolution(2)

create_dataframe(2)

__init__(2)

append_tokens(2)

SUBS(1)

absolute(1)

WRITE(1)

TYPE(1)

PUSHS(1)

SUB(1)

STRLEN(1)

PointsPosition(1)

STRI2INT(1)

SETCHAR(1)

RETURN(1)

READ(1)

STRI2INTS(1)

ADD(1)

aggregate_times(1)

binary_operation(1)

make_plot_csv(1)

fromCode(1)

computePower(1)

computeFibonacci(1)

bootVolumeDelete(1)

bootVolumeAttach(1)

avg(1)

and_op(1)

average_price_thirty_percent(1)

average_listing_selling_price(1)

authenticate(1)

append_tf_idf(1)

append_ranks(1)

PUSHFRAME(1)

append_date_columns(1)

POPFRAME(1)

POPS(1)

DPRINT(1)

GTS(1)

GT(1)

GETCHAR(1)

FLOAT2INT(1)

EXIT(1)

EQS(1)

Example #1

Show file

File: test_operations.py Project: ari99/wiki_stats

def df(spark_context, hive_context):
    """
    Fixture for creative a test dataframe.
    Args:
        :param spark_context: SparkContext object from fixture.
        :param hive_context: HiveContext object from fixture.
    Returns:
        :return: DataFrame object.
    """
    input = ['ace Beubiri 10 12744',
             'ace Bhutan 20 31284',
             'ace Bireu%c3%abn 30 20356',
             'ace Bireuen 40 20347',
             'ace Bishkek 50 14665',
             'ace John_Person%27s_first_100_days 60 14576',
             'ace Bolivia 70 32058',
             'ace Bosnia_H%c3%a8rz%c3%a8govina 80 38777']
    rdd = spark_context.parallelize(input)
    ops = Operations()
    df = ops.create_dataframe(rdd, hive_context)
    return df

Example #2

Show file

File: wiki_stats.py Project: ari99/wiki_stats

    sc = SparkContext(appName="wikistats")
    lines = sc.textFile("s3n://my.wiki.bucket.com/wikidata")

sc._jsc.hadoopConfiguration().set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "###")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "#####")


sqlContext = HiveContext(sc)


from operations import Operations

ops = Operations()
#Create the dataframe from the lines RDD
df = ops.create_dataframe(lines, sqlContext)
#Clean the 'pagename' column of encoded characters
df = ops.clean_string_column(df, 'pagename')
#Add columns for hour, day, month, year from the file name
df = ops.append_date_columns(df)

#Group by timeframes
hour_df, day_df, month_df, year_df = ops.aggregate_times(df)
#Create tokens from the pagename
hour_df = ops.append_tokens(hour_df)
#Add term frequency and inverse document frequency
hour_df = ops.append_tf_idf(hour_df)
#Create ranking
hour_df, day_df, month_df, year_df = ops.append_ranks(hour_df, day_df, month_df, year_df)

#Get the top 200 for each timeframe