Python pack_longの例

プログラミング言語: Python

名前空間/パッケージ名: pyspark.serializers

メソッド/関数: pack_long

hotexamples.comのコード掲載数: 5

Python pack_long - 5件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのpyspark.serializers.pack_longの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

コード例 #1

ファイルを表示

        def add_shuffle_key(split, iterator):

            buckets = defaultdict(list)
            c, batch = 0, min(10 * numPartitions, 1000)

            for k, v in iterator:
                buckets[partitionFunc(k) % numPartitions].append((k, v))
                c += 1

                # check used memory and avg size of chunk of objects
                if (c % 1000 == 0 and get_used_memory() > limit or c > batch):
                    n, size = len(buckets), 0
                    for split in buckets.keys():
                        yield pack_long(split)
                        d = outputSerializer.dumps(buckets[split])
                        del buckets[split]
                        yield d
                        size += len(d)

                    avg = (size / n) >> 20
                    # let 1M < avg < 10M
                    if avg < 1:
                        batch *= 1.5
                    elif avg > 10:
                        batch = max(batch / 1.5, 1)
                    c = 0

            for split, items in buckets.iteritems():
                yield pack_long(split)
                yield outputSerializer.dumps(items)

コード例 #2

ファイルを表示

ファイル: dstream.py プロジェクト: giworld/spark

        def add_shuffle_key(split, iterator):

            buckets = defaultdict(list)
            c, batch = 0, min(10 * numPartitions, 1000)

            for k, v in iterator:
                buckets[partitionFunc(k) % numPartitions].append((k, v))
                c += 1

                # check used memory and avg size of chunk of objects
                if (c % 1000 == 0 and get_used_memory() > limit
                        or c > batch):
                    n, size = len(buckets), 0
                    for split in buckets.keys():
                        yield pack_long(split)
                        d = outputSerializer.dumps(buckets[split])
                        del buckets[split]
                        yield d
                        size += len(d)

                    avg = (size / n) >> 20
                    # let 1M < avg < 10M
                    if avg < 1:
                        batch *= 1.5
                    elif avg > 10:
                        batch = max(batch / 1.5, 1)
                    c = 0

            for split, items in buckets.iteritems():
                yield pack_long(split)
                yield outputSerializer.dumps(items)

コード例 #3

ファイルを表示

ファイル: rdd.py プロジェクト: WANdisco/incubator-spark

        def add_shuffle_key(split, iterator):

            buckets = defaultdict(list)

            for (k, v) in iterator:
                buckets[partitionFunc(k) % numPartitions].append((k, v))
            for (split, items) in buckets.iteritems():
                yield pack_long(split)
                yield dump_pickle(Batch(items))

コード例 #4

ファイルを表示

ファイル: rdd.py プロジェクト: xoltar/spark

        def add_shuffle_key(split, iterator):

            buckets = defaultdict(list)

            for (k, v) in iterator:
                buckets[partitionFunc(k) % numPartitions].append((k, v))
            for (split, items) in buckets.iteritems():
                yield pack_long(split)
                yield outputSerializer.dumps(items)

コード例 #5

ファイルを表示

        def add_shuffle_key(split, iterator):

            buckets = defaultdict(list)

            for (k, v) in iterator:
                buckets[partitionFunc(k) % numPartitions].append((k, v))
            for (split, items) in buckets.iteritems():
                yield pack_long(split)
                yield dump_pickle(Batch(items))