Python SparkContext.closeの例

プログラミング言語: Python

名前空間/パッケージ名: pyspark

クラス/型: SparkContext

メソッド/関数: close

hotexamples.comのコード掲載数: 2

Python SparkContext.close - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのpyspark.SparkContext.closeの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

setLogLevel(30)

setSystemProperty(30)

setCheckpointDir(30)

getConf(30)

parallelize(30)

pickleFile(30)

broadcast(30)

emptyRDD(30)

newAPIHadoopFile(30)

binaryFiles(30)

addPyFile(30)

addFile(30)

accumulator(30)

getOrCreate(30)

SparkContext(30)

sequenceFile(30)

newAPIHadoopRDD(25)

_ensure_initialized(14)

createDataFrame(11)

hadoopFile(10)

show_profiles(9)

range(8)

dump_profiles(6)

mongoRDD(6)

binaryRecords(6)

map(4)

setLocalProperty(3)

runJob(3)

flatMap(2)

cassandraTable(2)

collect(2)

close(2)

setJobGroup(2)

paralellize(1)

neo4jTable(1)

neo4jConfig(1)

parallelise(1)

BSONFileRDD(1)

parallelized(1)

parallize(1)

reduceByKey(1)

sample(1)

mongoPairRDD(1)

setMaster(1)

show_profile(1)

sortBy(1)

saveAsTextFile(1)

hadoopConfiguration(1)

mixin(1)

filter(1)

コード例 #1

ファイルを表示

    def main(self):

        # loading configuration parameters (from a config file when working on a project)
        zk, topic, app_name, batch_duration, master = self.setConfiguration()

        # initiate the spark context / streaming context
        conf = (SparkConf().setMaster(master))
        sc = SparkContext(appName=app_name, conf=conf)
        ssc = StreamingContext(sc, batch_duration)

        # reading data to kafka
        kvs = KafkaUtils.createStream(ssc, zk, "spark-streaming-consumer",
                                      {topic: 1})
        lines = kvs.map(lambda x: x[1])

        lines.pprint()

        ssc.start()  # Start the computation
        ssc.awaitTermination()  # Wait for the computation to terminate
        sc.close()

コード例 #2

ファイルを表示

ファイル: aggregate_by_key.py プロジェクト: pwachira/coursera_dataengineering

from pyspark import SparkContext, SparkConf
import operator
import os

os.chdir()
#read in a local file
sc = SparkContext(conf=SparkConf().setAppName('App').setMaster('local'))
raw_data = sc.textFile('/data/twitter/twitter_sample_small.txt')


#define a method to read the data, split by tab
def parse_edge(s):
    user, follower = s.split('\t')
    return (int(user), int(follower))


# cache the intermediate rdd after parsing it
edges = raw_data.map(parse_edge).cache()

#apply aggregateByKey - see explanation below the code
fol_agg = edges.aggregateByKey(0,lambda v1,v2: v1+1 \
                               ,operator.add)

# top user/key with most followers.
# use operator to make sure the values(aggregated counts) and not the keys/userIds
#  are used for the comparison
top_user = fol_agg.top(1, operator.itemgetter(1))
print '%d %d' % (top_user[0][0], top_user[0][1])
sc.close()