Python SparkContext.mapPartitionsWithIndex Beispiele

Programmiersprache: Python

Namespace / Paketname: pyspark

Klasse / Typ: SparkContext

Methode / Funktion: mapPartitionsWithIndex

Beispiele auf hotexamples.com: 1

Python SparkContext.mapPartitionsWithIndex - 1 Beispiele gefunden. Dies sind die am besten bewerteten Python Beispiele für die pyspark.SparkContext.mapPartitionsWithIndex, die aus Open Source-Projekten extrahiert wurden. Sie können Beispiele bewerten, um die Qualität der Beispiele zu verbessern.

Häufig verwendete Methoden

Anzeigen Verbergen

setLogLevel(30)

setSystemProperty(30)

setCheckpointDir(30)

getConf(30)

parallelize(30)

pickleFile(30)

broadcast(30)

emptyRDD(30)

newAPIHadoopFile(30)

binaryFiles(30)

addPyFile(30)

addFile(30)

accumulator(30)

getOrCreate(30)

SparkContext(30)

sequenceFile(30)

newAPIHadoopRDD(25)

_ensure_initialized(14)

createDataFrame(11)

hadoopFile(10)

show_profiles(9)

range(8)

dump_profiles(6)

mongoRDD(6)

binaryRecords(6)

map(4)

setLocalProperty(3)

runJob(3)

flatMap(2)

cassandraTable(2)

collect(2)

close(2)

setJobGroup(2)

paralellize(1)

neo4jTable(1)

neo4jConfig(1)

parallelise(1)

BSONFileRDD(1)

parallelized(1)

parallize(1)

reduceByKey(1)

sample(1)

mongoPairRDD(1)

setMaster(1)

show_profile(1)

sortBy(1)

saveAsTextFile(1)

hadoopConfiguration(1)

mixin(1)

filter(1)

Beispiel #1

Datei anzeigen

Datei: spatial_join.py Projekt: luckyjin7/Apache-Spark-spatial-join

def toCSV(data):
    return ','.join(str(d) for d in data)


# In[5]:

from pyspark import SparkContext
from heapq import nlargest
import sys
import os

if __name__ == '__main__':
    input_file = sys.argv[1]
    output_file = sys.argv[2]

    rdd = SparkContext().textFile(input_file)
    rdd.mapPartitionsWithIndex(processTrips)\
    .reduceByKey(lambda x,y: x+y)\
    .map(lambda x: (x[0][0],x[0][1],x[1]))\
    .groupBy(lambda x: x[0])\
    .flatMap(lambda y: nlargest(3, y[1], key = lambda x: x[2]))\
    .map(lambda x: (x[0],(x[1],x[2])))\
    .reduceByKey(lambda x,y: x+y)\
    .sortByKey()\
    .map(lambda x: ((x[0],)+x[1]))\
    .map(toCSV)\
    .saveAsTextFile(output_file)

# In[ ]: