Python SparkContext.parallelized Beispiele

Programmiersprache: Python

Namespace / Paketname: pyspark

Klasse / Typ: SparkContext

Methode / Funktion: parallelized

Beispiele auf hotexamples.com: 1

Python SparkContext.parallelized - 1 Beispiele gefunden. Dies sind die am besten bewerteten Python Beispiele für die pyspark.SparkContext.parallelized, die aus Open Source-Projekten extrahiert wurden. Sie können Beispiele bewerten, um die Qualität der Beispiele zu verbessern.

Häufig verwendete Methoden

Anzeigen Verbergen

setLogLevel(30)

setSystemProperty(30)

setCheckpointDir(30)

getConf(30)

parallelize(30)

pickleFile(30)

broadcast(30)

emptyRDD(30)

newAPIHadoopFile(30)

binaryFiles(30)

addPyFile(30)

addFile(30)

accumulator(30)

getOrCreate(30)

SparkContext(30)

sequenceFile(30)

newAPIHadoopRDD(25)

_ensure_initialized(14)

createDataFrame(11)

hadoopFile(10)

show_profiles(9)

range(8)

dump_profiles(6)

mongoRDD(6)

binaryRecords(6)

map(4)

setLocalProperty(3)

runJob(3)

flatMap(2)

cassandraTable(2)

collect(2)

close(2)

setJobGroup(2)

paralellize(1)

neo4jTable(1)

neo4jConfig(1)

parallelise(1)

BSONFileRDD(1)

parallelized(1)

parallize(1)

reduceByKey(1)

sample(1)

mongoPairRDD(1)

setMaster(1)

show_profile(1)

sortBy(1)

saveAsTextFile(1)

hadoopConfiguration(1)

mixin(1)

filter(1)

Beispiel #1

Datei anzeigen

Datei: ccextra.py Projekt: Bell-Wang/computer-system

    words = lines.flatMap(parseContext)
    words_swap = words.map(lambda (x, y): (y, x))
    wordcount = words.map(lambda s: (s, 1)).reduceByKey(lambda a, b: a + b)
    wordcount_page = words_swap.map(lambda s: (s, 1)).reduceByKey(lambda a, b: a + b)
    count_page = words.map(lambda (a, b): (a, 1)).reduceByKey(lambda a, b: a + b)
    doc_word = words_swap.distinct().map(lambda (a, b): (a, 1)).reduceByKey(lambda a, b: a + b)
    app = []
    for (((id, title), word), n) in wordcount.collect():
        word_page = words.filter(lambda x: (id, title) in x).count()
        word_all_page = words.filter(lambda x: word in x).distinct().count()
        tf_idf = (n / word_page) * math.log((doc_count / word_all_page))
        app.append([(id, title, word, tf_idf)])


    ##part2 read as RDD
    v = sc.parallelized(app)
    trans = v.map(lambda (a, b): (a, list(b))).groupByKey()  ##apend word as list by id
    ##key pair similarity(e-distance)
    def similar(wf):
        fun_result = []
        list1 = {}
        list2 = {}
        for item in v[0][1]:
            fun_result.append(item[0])
            list1.setdefault(item[0],item[1])
        for item in v[1][1]:
            if item[0] not in fun_result:
                fun_result.append(item[0])
            list2.setdefault(item[0],item[1])
        result1 = []
        result2 = []