Python batched примеры использования

Язык программирования: Python

Пространство имен/Пакет: pyspark.serializers

Метод/Функция: batched

Примеров на hotexamples.com: 6

Python batched - 6 примеров найдено. Это лучшие примеры Python кода для pyspark.serializers.batched, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Пример #1

Показать файл

Файл: context.py Проект: fightingBilling/spark-1

 def parallelize(self, c, numSlices=None):
     """
     Distribute a local Python collection to form an RDD.
     """
     numSlices = numSlices or self.defaultParallelism
     # Calling the Java parallelize() method with an ArrayList is too slow,
     # because it sends O(n) Py4J commands.  As an alternative, serialized
     # objects are written to a file and loaded through textFile().
     tempFile = NamedTemporaryFile(delete=False, dir=self._temp_dir)
     if self.batchSize != 1:
         c = batched(c, self.batchSize)
     for x in c:
         write_with_length(dump_pickle(x), tempFile)
     tempFile.close()
     readRDDFromPickleFile = self._jvm.PythonRDD.readRDDFromPickleFile
     jrdd = readRDDFromPickleFile(self._jsc, tempFile.name, numSlices)
     return RDD(jrdd, self)

Пример #2

Показать файл

Файл: context.py Проект: fernand/spark

 def parallelize(self, c, numSlices=None):
     """
     Distribute a local Python collection to form an RDD.
     """
     numSlices = numSlices or self.defaultParallelism
     # Calling the Java parallelize() method with an ArrayList is too slow,
     # because it sends O(n) Py4J commands.  As an alternative, serialized
     # objects are written to a file and loaded through textFile().
     tempFile = NamedTemporaryFile(delete=False)
     atexit.register(lambda: os.unlink(tempFile.name))
     if self.batchSize != 1:
         c = batched(c, self.batchSize)
     for x in c:
         write_with_length(dump_pickle(x), tempFile)
     tempFile.close()
     jrdd = self._readRDDFromPickleFile(self._jsc, tempFile.name, numSlices)
     return RDD(jrdd, self)

Пример #3

Показать файл

Файл: context.py Проект: AustinBGibbons/incubator-spark

    def parallelize(self, c, numSlices=None):
        """
        Distribute a local Python collection to form an RDD.

        >>> sc.parallelize(range(5), 5).glom().collect()
        [[0], [1], [2], [3], [4]]
        """
        numSlices = numSlices or self.defaultParallelism
        # Calling the Java parallelize() method with an ArrayList is too slow,
        # because it sends O(n) Py4J commands.  As an alternative, serialized
        # objects are written to a file and loaded through textFile().
        tempFile = NamedTemporaryFile(delete=False, dir=self._temp_dir)
        # Make sure we distribute data evenly if it's smaller than self.batchSize
        if "__len__" not in dir(c):
            c = list(c)    # Make it a list so we can compute its length
        batchSize = min(len(c) // numSlices, self.batchSize)
        if batchSize > 1:
            c = batched(c, batchSize)
        for x in c:
            write_with_length(dump_pickle(x), tempFile)
        tempFile.close()
        readRDDFromPickleFile = self._jvm.PythonRDD.readRDDFromPickleFile
        jrdd = readRDDFromPickleFile(self._jsc, tempFile.name, numSlices)
        return RDD(jrdd, self)

Пример #4

Показать файл

    def parallelize(self, c, numSlices=None):
        """
        Distribute a local Python collection to form an RDD.

        >>> sc.parallelize(range(5), 5).glom().collect()
        [[0], [1], [2], [3], [4]]
        """
        numSlices = numSlices or self.defaultParallelism
        # Calling the Java parallelize() method with an ArrayList is too slow,
        # because it sends O(n) Py4J commands.  As an alternative, serialized
        # objects are written to a file and loaded through textFile().
        tempFile = NamedTemporaryFile(delete=False, dir=self._temp_dir)
        # Make sure we distribute data evenly if it's smaller than self.batchSize
        if "__len__" not in dir(c):
            c = list(c)  # Make it a list so we can compute its length
        batchSize = min(len(c) // numSlices, self.batchSize)
        if batchSize > 1:
            c = batched(c, batchSize)
        for x in c:
            write_with_length(dump_pickle(x), tempFile)
        tempFile.close()
        readRDDFromPickleFile = self._jvm.PythonRDD.readRDDFromPickleFile
        jrdd = readRDDFromPickleFile(self._jsc, tempFile.name, numSlices)
        return RDD(jrdd, self)

Пример #5

Показать файл

Файл: rdd.py Проект: WANdisco/incubator-spark

 def batched_func(split, iterator):
     return batched(oldfunc(split, iterator), batchSize)

Пример #6

Показать файл

 def batched_func(split, iterator):
     return batched(oldfunc(split, iterator), batchSize)