Python OperationInfo.parent 예제들, flink.plan.OperationInfo.OperationInfo.parent Python 예제들

예제 #1

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

    def reduce(self, operator):
        """
        Applies a Reduce transformation on a non-grouped DataSet.

        The transformation consecutively calls a ReduceFunction until only a single element remains which is the result
        of the transformation. A ReduceFunction combines two elements into one new element of the same type.

        :param operator:The ReduceFunction that is applied on the DataSet.
        :return:A ReduceOperator that represents the reduced DataSet.
        """
        self._finalize()
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = ReduceFunction()
            operator.reduce = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.REDUCE
        child.parent = self._info
        child.operator = operator
        child.name = "PythonReduce"
        child.types = _createArrayTypeInfo()
        child.key1 = self._child_chain[0].keys
        self._info.parallelism = child.parallelism
        self._info.children.append(child)
        self._env._sets.append(child)

        return child_set

예제 #2

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

    def reduce(self, operator):
        """
        Applies a Reduce transformation on a non-grouped DataSet.

        The transformation consecutively calls a ReduceFunction until only a single element remains which is the result
        of the transformation. A ReduceFunction combines two elements into one new element of the same type.

        :param operator:The ReduceFunction that is applied on the DataSet.
        :return:A ReduceOperator that represents the reduced DataSet.
        """
        self._finalize()
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = ReduceFunction()
            operator.reduce = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.REDUCE
        child.parent = self._info
        child.operator = operator
        child.name = "PythonReduce"
        child.types = _createArrayTypeInfo()
        child.key1 = self._child_chain[0].keys
        self._info.parallelism = child.parallelism
        self._info.children.append(child)
        self._env._sets.append(child)

        return child_set

예제 #3

0

파일 보기

    def flat_map(self, operator, types):
        """
        Applies a FlatMap transformation on a DataSet.

        The transformation calls a FlatMapFunction for each element of the DataSet.
        Each FlatMapFunction call can return any number of elements including none.

        :param operator: The FlatMapFunction that is called for each element of the DataSet.
        :param types: The type of the resulting DataSet.
        :return:A FlatMapOperator that represents the transformed DataSe
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = FlatMapFunction()
            operator.flat_map = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.FLATMAP
        child.parent = self._info
        child.operator = operator
        child.meta = str(inspect.getmodule(operator)) + "|" + str(
            operator.__class__.__name__)
        child.types = types
        child.name = "PythonFlatMap"
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #4

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

    def reduce_group(self, operator, combinable=False):
        """
        Applies a GroupReduce transformation.

        The transformation calls a GroupReduceFunction once for each group of the DataSet, or one when applied on a
        non-grouped DataSet.
        The GroupReduceFunction can iterate over all elements of the DataSet and
        emit any number of output elements including none.

        :param operator: The GroupReduceFunction that is applied on the DataSet.
        :return:A GroupReduceOperator that represents the reduced DataSet.
        """
        self._finalize()
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = GroupReduceFunction()
            operator.reduce = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.GROUPREDUCE
        child.parent = self._info
        child.operator = operator
        child.types = _createArrayTypeInfo()
        child.name = "PythonGroupReduce"
        child.key1 = self._child_chain[0].keys
        self._info.parallelism = child.parallelism
        self._info.children.append(child)
        self._env._sets.append(child)

        return child_set

예제 #5

0

파일 보기

    def reduce_group(self, operator, types, combinable=False):
        """
        Applies a GroupReduce transformation.

        The transformation calls a GroupReduceFunction once for each group of the DataSet, or one when applied on a
        non-grouped DataSet.
        The GroupReduceFunction can iterate over all elements of the DataSet and
        emit any number of output elements including none.

        :param operator: The GroupReduceFunction that is applied on the DataSet.
        :param types: The type of the resulting DataSet.
        :return:A GroupReduceOperator that represents the reduced DataSet.
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = GroupReduceFunction()
            operator.reduce = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.GROUPREDUCE
        child.parent = self._info
        child.operator = copy.deepcopy(operator)
        child.operator._combine = False
        child.meta = str(inspect.getmodule(operator)) + "|" + str(
            operator.__class__.__name__)
        child.types = types
        child.combine = combinable
        child.combineop = operator
        child.combineop._combine = True
        child.name = "PythonGroupReduce"
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #6

0

파일 보기

 def _output(self, to_error):
     child = OperationInfo()
     child.identifier = _Identifier.SINK_PRINT
     child.parent = self._info
     child.to_err = to_error
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #7

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def filter(self, operator):
        """
        Applies a Filter transformation on a DataSet.

        he transformation calls a FilterFunction for each element of the DataSet and retains only those element
        for which the function returns true. Elements for which the function returns false are filtered.

        :param operator: The FilterFunction that is called for each element of the DataSet.
        :return:A FilterOperator that represents the filtered DataSet.
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = FilterFunction()
            operator.filter = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.FILTER
        child.parent = self._info
        child.operator = operator
        child.meta = str(inspect.getmodule(operator)) + "|" + str(operator.__class__.__name__)
        child.name = "PythonFilter"
        child.types = deduct_output_type(self._info)
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #8

0

파일 보기

파일: DataSet.py 프로젝트: ramkrish86/flink

    def reduce_group(self, operator, combinable=False):
        """
        Applies a GroupReduce transformation.

        The transformation calls a GroupReduceFunction once for each group of the DataSet, or one when applied on a
        non-grouped DataSet.
        The GroupReduceFunction can iterate over all elements of the DataSet and
        emit any number of output elements including none.

        :param operator: The GroupReduceFunction that is applied on the DataSet.
        :return:A GroupReduceOperator that represents the reduced DataSet.
        """
        self._finalize()
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = GroupReduceFunction()
            operator.reduce = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.GROUPREDUCE
        child.parent = self._info
        child.operator = operator
        child.types = _createArrayTypeInfo()
        child.name = "PythonGroupReduce"
        child.key1 = self._child_chain[0].keys
        self._info.children.append(child)
        self._env._sets.append(child)

        return child_set

예제 #9

0

파일 보기

    def map(self, operator, types):
        """
        Applies a Map transformation on a DataSet.

        The transformation calls a MapFunction for each element of the DataSet.
        Each MapFunction call returns exactly one element.

        :param operator: The MapFunction that is called for each element of the DataSet.
        :param types: The type of the resulting DataSet
        :return:A MapOperator that represents the transformed DataSet
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = MapFunction()
            operator.map = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.MAP
        child.parent = self._info
        child.operator = operator
        child.meta = str(inspect.getmodule(operator)) + "|" + str(
            operator.__class__.__name__)
        child.types = types
        child.name = "PythonMap"
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #10

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

    def map_partition(self, operator):
        """
        Applies a MapPartition transformation on a DataSet.

        The transformation calls a MapPartitionFunction once per parallel partition of the DataSet.
        The entire partition is available through the given Iterator.
        Each MapPartitionFunction may return an arbitrary number of results.

        The number of elements that each instance of the MapPartition function
        sees is non deterministic and depends on the degree of parallelism of the operation.

        :param operator: The MapFunction that is called for each element of the DataSet.
        :return:A MapOperator that represents the transformed DataSet
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = MapPartitionFunction()
            operator.map_partition = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.MAPPARTITION
        child.parent = self._info
        child.operator = operator
        child.types = _createArrayTypeInfo()
        child.name = "PythonMapPartition"
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #11

0

파일 보기

파일: DataSet.py 프로젝트: ramkrish86/flink

 def _output(self, to_error):
     child = OperationInfo()
     child.identifier = _Identifier.SINK_PRINT
     child.parent = self._info
     child.to_err = to_error
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #12

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def reduce(self, operator):
        """
        Applies a Reduce transformation on a non-grouped DataSet.

        The transformation consecutively calls a ReduceFunction until only a single element remains which is the result
        of the transformation. A ReduceFunction combines two elements into one new element of the same type.

        :param operator:The ReduceFunction that is applied on the DataSet.
        :return:A ReduceOperator that represents the reduced DataSet.
        """
        operator._set_grouping_keys(self._child_chain[0].keys)
        for i in self._child_chain:
            self._env._sets.append(i)
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.REDUCE
        child.parent = self._info
        child.operator = copy.deepcopy(operator)
        child.operator._combine = False
        child.meta = str(inspect.getmodule(operator)) + "|" + str(operator.__class__.__name__)
        child.combine = True
        child.combineop = operator
        child.combineop._combine = True
        child.name = "PythonReduce"
        child.types = deduct_output_type(self._info)
        self._info.children.append(child)
        self._env._sets.append(child)

        return child_set

예제 #13

0

파일 보기

    def filter(self, operator):
        """
        Applies a Filter transformation on a DataSet.

        he transformation calls a FilterFunction for each element of the DataSet and retains only those element
        for which the function returns true. Elements for which the function returns false are filtered.

        :param operator: The FilterFunction that is called for each element of the DataSet.
        :return:A FilterOperator that represents the filtered DataSet.
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = FilterFunction()
            operator.filter = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.FILTER
        child.parent = self._info
        child.operator = operator
        child.meta = str(inspect.getmodule(operator)) + "|" + str(
            operator.__class__.__name__)
        child.name = "PythonFilter"
        child.types = deduct_output_type(self._info)
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #14

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def reduce_group(self, operator, types, combinable=False):
        """
        Applies a GroupReduce transformation.

        The transformation calls a GroupReduceFunction once for each group of the DataSet, or one when applied on a
        non-grouped DataSet.
        The GroupReduceFunction can iterate over all elements of the DataSet and
        emit any number of output elements including none.

        :param operator: The GroupReduceFunction that is applied on the DataSet.
        :param types: The type of the resulting DataSet.
        :return:A GroupReduceOperator that represents the reduced DataSet.
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = GroupReduceFunction()
            operator.reduce = f
        operator._set_grouping_keys(self._child_chain[0].keys)
        operator._set_sort_ops([(x.field, x.order) for x in self._child_chain[1:]])
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.GROUPREDUCE
        child.parent = self._info
        child.operator = copy.deepcopy(operator)
        child.operator._combine = False
        child.meta = str(inspect.getmodule(operator)) + "|" + str(operator.__class__.__name__)
        child.types = types
        child.combine = combinable
        child.combineop = operator
        child.combineop._combine = True
        child.name = "PythonGroupReduce"
        self._info.children.append(child)
        self._env._sets.append(child)

        return child_set

예제 #15

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def map(self, operator, types):
        """
        Applies a Map transformation on a DataSet.

        The transformation calls a MapFunction for each element of the DataSet.
        Each MapFunction call returns exactly one element.

        :param operator: The MapFunction that is called for each element of the DataSet.
        :param types: The type of the resulting DataSet
        :return:A MapOperator that represents the transformed DataSet
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = MapFunction()
            operator.map = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.MAP
        child.parent = self._info
        child.operator = operator
        child.meta = str(inspect.getmodule(operator)) + "|" + str(operator.__class__.__name__)
        child.types = types
        child.name = "PythonMap"
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #16

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def flat_map(self, operator, types):
        """
        Applies a FlatMap transformation on a DataSet.

        The transformation calls a FlatMapFunction for each element of the DataSet.
        Each FlatMapFunction call can return any number of elements including none.

        :param operator: The FlatMapFunction that is called for each element of the DataSet.
        :param types: The type of the resulting DataSet.
        :return:A FlatMapOperator that represents the transformed DataSe
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = FlatMapFunction()
            operator.flat_map = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.FLATMAP
        child.parent = self._info
        child.operator = operator
        child.meta = str(inspect.getmodule(operator)) + "|" + str(operator.__class__.__name__)
        child.types = types
        child.name = "PythonFlatMap"
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #17

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

    def map_partition(self, operator):
        """
        Applies a MapPartition transformation on a DataSet.

        The transformation calls a MapPartitionFunction once per parallel partition of the DataSet.
        The entire partition is available through the given Iterator.
        Each MapPartitionFunction may return an arbitrary number of results.

        The number of elements that each instance of the MapPartition function
        sees is non deterministic and depends on the degree of parallelism of the operation.

        :param operator: The MapFunction that is called for each element of the DataSet.
        :return:A MapOperator that represents the transformed DataSet
        """
        if isinstance(operator, TYPES.FunctionType):
            f = operator
            operator = MapPartitionFunction()
            operator.map_partition = f
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.MAPPARTITION
        child.parent = self._info
        child.operator = operator
        child.types = _createArrayTypeInfo()
        child.name = "PythonMapPartition"
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #18

0

파일 보기

    def reduce(self, operator):
        """
        Applies a Reduce transformation on a non-grouped DataSet.

        The transformation consecutively calls a ReduceFunction until only a single element remains which is the result
        of the transformation. A ReduceFunction combines two elements into one new element of the same type.

        :param operator:The ReduceFunction that is applied on the DataSet.
        :return:A ReduceOperator that represents the reduced DataSet.
        """
        operator._set_grouping_keys(self._child_chain[0].keys)
        for i in self._child_chain:
            self._env._sets.append(i)
        child = OperationInfo()
        child_set = OperatorSet(self._env, child)
        child.identifier = _Identifier.REDUCE
        child.parent = self._info
        child.operator = copy.deepcopy(operator)
        child.operator._combine = False
        child.meta = str(inspect.getmodule(operator)) + "|" + str(
            operator.__class__.__name__)
        child.combine = True
        child.combineop = operator
        child.combineop._combine = True
        child.name = "PythonReduce"
        child.types = deduct_output_type(self._info)
        self._info.children.append(child)
        self._env._sets.append(child)

        return child_set

예제 #19

0

파일 보기

파일: DataSet.py 프로젝트: ramkrish86/flink

 def _write_text(self, path, write_mode):
     child = OperationInfo()
     child.identifier = _Identifier.SINK_TEXT
     child.parent = self._info
     child.path = path
     child.write_mode = write_mode
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #20

0

파일 보기

 def _write_text(self, path, write_mode):
     child = OperationInfo()
     child.identifier = _Identifier.SINK_TEXT
     child.parent = self._info
     child.path = path
     child.write_mode = write_mode
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #21

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

 def with_broadcast_set(self, name, set):
     child = OperationInfo()
     child.identifier = _Identifier.BROADCAST
     child.parent = self._info
     child.other = set._info
     child.name = name
     self._info.bcvars.append(child)
     self._env._broadcast.append(child)
     return self

예제 #22

0

파일 보기

 def with_broadcast_set(self, name, set):
     child = OperationInfo()
     child.parent = self._info
     child.other = set._info
     child.name = name
     self._info.bcvars.append(child)
     set._info.children.append(child)
     self._env._broadcast.append(child)
     return self

예제 #23

0

파일 보기

 def output(self, to_error=False):
     """
     Writes a DataSet to the standard output stream (stdout).
     """
     child = OperationInfo()
     child.identifier = _Identifier.SINK_PRINT
     child.parent = self._info
     child.to_err = to_error
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #24

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

 def output(self, to_error=False):
     """
     Writes a DataSet to the standard output stream (stdout).
     """
     child = OperationInfo()
     child.identifier = _Identifier.SINK_PRINT
     child.parent = self._info
     child.to_err = to_error
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #25

0

파일 보기

 def _write_csv(self, path, line_delimiter, field_delimiter, write_mode):
     child = OperationInfo()
     child.identifier = _Identifier.SINK_CSV
     child.path = path
     child.parent = self._info
     child.delimiter_field = field_delimiter
     child.delimiter_line = line_delimiter
     child.write_mode = write_mode
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #26

0

파일 보기

 def _distinct(self, fields):
     self._info.types = _createKeyValueTypeInfo(len(fields))
     child = OperationInfo()
     child_set = DataSet(self._env, child)
     child.identifier = _Identifier.DISTINCT
     child.parent = self._info
     child.keys = fields
     self._info.children.append(child)
     self._env._sets.append(child)
     return child_set

예제 #27

0

파일 보기

파일: DataSet.py 프로젝트: jianghe2020/flink

 def _distinct(self, fields):
     self._info.types = _createKeyValueTypeInfo(len(fields))
     child = OperationInfo()
     child_set = DataSet(self._env, child)
     child.identifier = _Identifier.DISTINCT
     child.parent = self._info
     child.keys = fields
     self._info.children.append(child)
     self._env._sets.append(child)
     return child_set

예제 #28

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

 def _output(self, to_error):
     child = OperationInfo()
     child_set = DataSink(self._env, child)
     child.identifier = _Identifier.SINK_PRINT
     child.parent = self._info
     child.to_err = to_error
     self._info.parallelism = child.parallelism
     self._info.sinks.append(child)
     self._env._sinks.append(child)
     return child_set

예제 #29

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

 def _cross(self, other_set, identifier):
     child = OperationInfo()
     child_set = CrossOperator(self._env, child)
     child.identifier = identifier
     child.parent = self._info
     child.other = other_set._info
     self._info.children.append(child)
     other_set._info.children.append(child)
     self._env._sets.append(child)
     return child_set

예제 #30

0

파일 보기

파일: DataSet.py 프로젝트: ramkrish86/flink

 def _write_csv(self, path, line_delimiter, field_delimiter, write_mode):
     child = OperationInfo()
     child.identifier = _Identifier.SINK_CSV
     child.path = path
     child.parent = self._info
     child.delimiter_field = field_delimiter
     child.delimiter_line = line_delimiter
     child.write_mode = write_mode
     self._info.sinks.append(child)
     self._env._sinks.append(child)

예제 #31

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

 def _cross(self, other_set, identifier):
     child = OperationInfo()
     child_set = CrossOperator(self._env, child)
     child.identifier = identifier
     child.parent = self._info
     child.other = other_set._info
     self._info.children.append(child)
     other_set._info.children.append(child)
     self._env._sets.append(child)
     return child_set

예제 #32

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

 def _output(self, to_error):
     child = OperationInfo()
     child_set = DataSink(self._env, child)
     child.identifier = _Identifier.SINK_PRINT
     child.parent = self._info
     child.to_err = to_error
     self._info.parallelism = child.parallelism
     self._info.sinks.append(child)
     self._env._sinks.append(child)
     return child_set

예제 #33

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

 def _group_by(self, keys):
     child = OperationInfo()
     child_chain = []
     child_set = UnsortedGrouping(self._env, child, child_chain)
     child.identifier = _Identifier.GROUP
     child.parent = self._info
     child.keys = keys
     child_chain.append(child)
     self._info.children.append(child)
     self._env._sets.append(child)
     return child_set

예제 #34

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

 def _group_by(self, keys):
     child = OperationInfo()
     child_chain = []
     child_set = UnsortedGrouping(self._env, child, child_chain)
     child.identifier = _Identifier.GROUP
     child.parent = self._info
     child.keys = keys
     child_chain.append(child)
     self._info.children.append(child)
     self._env._sets.append(child)
     return child_set

예제 #35

0

파일 보기

 def _createProjector(env, info):
     child = OperationInfo()
     child_set = Projector(env, child)
     child.identifier = _Identifier.MAP
     child.operator = MapFunction()
     child.parent = info
     child.types = _createArrayTypeInfo()
     child.name = "Projector"
     info.children.append(child)
     env._sets.append(child)
     return child_set

예제 #36

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

 def _write_text(self, path, write_mode):
     child = OperationInfo()
     child_set = DataSink(self._env, child)
     child.identifier = _Identifier.SINK_TEXT
     child.parent = self._info
     child.path = path
     child.write_mode = write_mode
     self._info.parallelism = child.parallelism
     self._info.sinks.append(child)
     self._env._sinks.append(child)
     return child_set

예제 #37

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

 def _write_text(self, path, write_mode):
     child = OperationInfo()
     child_set = DataSink(self._env, child)
     child.identifier = _Identifier.SINK_TEXT
     child.parent = self._info
     child.path = path
     child.write_mode = write_mode
     self._info.parallelism = child.parallelism
     self._info.sinks.append(child)
     self._env._sinks.append(child)
     return child_set

예제 #38

0

파일 보기

파일: DataSet.py 프로젝트: ramkrish86/flink

 def _createProjector(env, info):
     child = OperationInfo()
     child_set = Projector(env, child)
     child.identifier = _Identifier.MAP
     child.operator = MapFunction()
     child.parent = info
     child.types = _createArrayTypeInfo()
     child.name = "Projector"
     info.children.append(child)
     env._sets.append(child)
     return child_set

예제 #39

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

 def _reduce_group(self, operator, combinable=False):
     if isinstance(operator, TYPES.FunctionType):
         f = operator
         operator = GroupReduceFunction()
         operator.reduce = f
     child = OperationInfo()
     child.identifier = _Identifier.GROUPREDUCE
     child.parent = self._info
     child.operator = operator
     child.types = _createArrayTypeInfo()
     child.name = "PythonGroupReduce"
     return child

예제 #40

0

파일 보기

파일: DataSet.py 프로젝트: ATCP/flink-1.1.1

 def _reduce_group(self, operator, combinable=False):
     if isinstance(operator, TYPES.FunctionType):
         f = operator
         operator = GroupReduceFunction()
         operator.reduce = f
     child = OperationInfo()
     child.identifier = _Identifier.GROUPREDUCE
     child.parent = self._info
     child.operator = operator
     child.types = _createArrayTypeInfo()
     child.name = "PythonGroupReduce"
     return child

예제 #41

0

파일 보기

    def write_text(self, path, write_mode=WriteMode.NO_OVERWRITE):
        """
        Writes a DataSet as a text file to the specified location.

        :param path: he path pointing to the location the text file is written to.
        :param write_mode: OutputFormat.WriteMode value, indicating whether files should be overwritten
        """
        child = OperationInfo()
        child.identifier = _Identifier.SINK_TEXT
        child.parent = self._info
        child.path = path
        child.write_mode = write_mode
        self._info.sinks.append(child)
        self._env._sinks.append(child)

예제 #42

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def write_text(self, path, write_mode=WriteMode.NO_OVERWRITE):
        """
        Writes a DataSet as a text file to the specified location.

        :param path: he path pointing to the location the text file is written to.
        :param write_mode: OutputFormat.WriteMode value, indicating whether files should be overwritten
        """
        child = OperationInfo()
        child.identifier = _Identifier.SINK_TEXT
        child.parent = self._info
        child.path = path
        child.write_mode = write_mode
        self._info.sinks.append(child)
        self._env._sinks.append(child)

예제 #43

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

    def rebalance(self):
        """
        Enforces a re-balancing of the DataSet, i.e., the DataSet is evenly distributed over all parallel instances of the
        following task. This can help to improve performance in case of heavy data skew and compute intensive operations.
        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        :return: The re-balanced DataSet.
        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.REBALANCE
        child.parent = self._info
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #44

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

    def rebalance(self):
        """
        Enforces a re-balancing of the DataSet, i.e., the DataSet is evenly distributed over all parallel instances of the
        following task. This can help to improve performance in case of heavy data skew and compute intensive operations.
        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        :return: The re-balanced DataSet.
        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.REBALANCE
        child.parent = self._info
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #45

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

    def first(self, count):
        """
        Returns a new set containing the first n elements in this DataSet.

        :param count: The desired number of elements.
        :return: A DataSet containing the elements.
        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.FIRST
        child.parent = self._info
        child.count = count
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #46

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

    def first(self, count):
        """
        Returns a new set containing the first n elements in this DataSet.

        :param count: The desired number of elements.
        :return: A DataSet containing the elements.
        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.FIRST
        child.parent = self._info
        child.count = count
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #47

0

파일 보기

    def _partition_by_hash(self, fields):
        """
        Hash-partitions a DataSet on the specified key fields.
        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        :param fields: The field indexes on which the DataSet is hash-partitioned.
        :return: The partitioned DataSet.
        """
        self._info.types = _createKeyValueTypeInfo(len(fields))
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.PARTITION_HASH
        child.parent = self._info
        child.keys = fields
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #48

0

파일 보기

파일: DataSet.py 프로젝트: jianghe2020/flink

    def _partition_by_hash(self, fields):
        """
        Hash-partitions a DataSet on the specified key fields.
        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        :param fields: The field indexes on which the DataSet is hash-partitioned.
        :return: The partitioned DataSet.
        """
        self._info.types = _createKeyValueTypeInfo(len(fields))
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.PARTITION_HASH
        child.parent = self._info
        child.keys = fields
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #49

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def write_csv(self, path, line_delimiter="\n", field_delimiter=',', write_mode=WriteMode.NO_OVERWRITE):
        """
        Writes a Tuple DataSet as a CSV file to the specified location.

        Note: Only a Tuple DataSet can written as a CSV file.
        :param path: The path pointing to the location the CSV file is written to.
        :param write_mode: OutputFormat.WriteMode value, indicating whether files should be overwritten
        """
        child = OperationInfo()
        child.identifier = _Identifier.SINK_CSV
        child.path = path
        child.parent = self._info
        child.delimiter_field = field_delimiter
        child.delimiter_line = line_delimiter
        child.write_mode = write_mode
        self._info.sinks.append(child)
        self._env._sinks.append(child)

예제 #50

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

    def union(self, other_set):
        """
        Creates a union of this DataSet with an other DataSet.

        The other DataSet must be of the same data type.

        :param other_set: The other DataSet which is unioned with the current DataSet.
        :return:The resulting DataSet.
        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.UNION
        child.parent = self._info
        child.other = other_set._info
        self._info.children.append(child)
        other_set._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #51

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

    def union(self, other_set):
        """
        Creates a union of this DataSet with an other DataSet.

        The other DataSet must be of the same data type.

        :param other_set: The other DataSet which is unioned with the current DataSet.
        :return:The resulting DataSet.
        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.UNION
        child.parent = self._info
        child.other = other_set._info
        self._info.children.append(child)
        other_set._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #52

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def project(self, *fields):
        """
        Applies a Project transformation on a Tuple DataSet.

        Note: Only Tuple DataSets can be projected. The transformation projects each Tuple of the DataSet onto a
        (sub)set of fields.

        :param fields: The field indexes of the input tuples that are retained.
                        The order of fields in the output tuple corresponds to the order of field indexes.
        :return: The projected DataSet.

        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.PROJECTION
        child.parent = self._info
        child.keys = fields
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #53

0

파일 보기

    def project(self, *fields):
        """
        Applies a Project transformation on a Tuple DataSet.

        Note: Only Tuple DataSets can be projected. The transformation projects each Tuple of the DataSet onto a
        (sub)set of fields.

        :param fields: The field indexes of the input tuples that are retained.
                        The order of fields in the output tuple corresponds to the order of field indexes.
        :return: The projected DataSet.

        """
        child = OperationInfo()
        child_set = DataSet(self._env, child)
        child.identifier = _Identifier.PROJECTION
        child.parent = self._info
        child.keys = fields
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #54

0

파일 보기

    def co_group(self, other_set):
        """
        Initiates a CoGroup transformation which combines the elements of two DataSets into on DataSet.

        It groups each DataSet individually on a key and gives groups of both DataSets with equal keys together into a
        CoGroupFunction. If a DataSet has a group with no matching key in the other DataSet,
        the CoGroupFunction is called with an empty group for the non-existing group.
        The CoGroupFunction can iterate over the elements of both groups and return any number of elements
        including none.

        :param other_set: The other DataSet of the CoGroup transformation.
        :return:A CoGroupOperator to continue the definition of the CoGroup transformation.
        """
        child = OperationInfo()
        other_set._info.children.append(child)
        child_set = CoGroupOperatorWhere(self._env, child)
        child.identifier = _Identifier.COGROUP
        child.parent = self._info
        child.other = other_set._info
        self._info.children.append(child)
        return child_set

예제 #55

0

파일 보기

파일: DataSet.py 프로젝트: chiwanpark/flink

    def sort_group(self, field, order):
        """
        Sorts Tuple elements within a group on the specified field in the specified Order.

        Note: Only groups of Tuple elements can be sorted.
        Groups can be sorted by multiple fields by chaining sort_group() calls.

        :param field:The Tuple field on which the group is sorted.
        :param order: The Order in which the specified Tuple field is sorted. See DataSet.Order.
        :return:A SortedGrouping with specified order of group element.
        """
        child = OperationInfo()
        child_set = SortedGrouping(self._env, child, self._child_chain)
        child.identifier = _Identifier.SORT
        child.parent = self._info
        child.field = field
        child.order = order
        self._info.children.append(child)
        self._child_chain.append(child)
        self._env._sets.append(child)
        return child_set

예제 #56

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def co_group(self, other_set):
        """
        Initiates a CoGroup transformation which combines the elements of two DataSets into on DataSet.

        It groups each DataSet individually on a key and gives groups of both DataSets with equal keys together into a
        CoGroupFunction. If a DataSet has a group with no matching key in the other DataSet,
        the CoGroupFunction is called with an empty group for the non-existing group.
        The CoGroupFunction can iterate over the elements of both groups and return any number of elements
        including none.

        :param other_set: The other DataSet of the CoGroup transformation.
        :return:A CoGroupOperator to continue the definition of the CoGroup transformation.
        """
        child = OperationInfo()
        other_set._info.children.append(child)
        child_set = CoGroupOperatorWhere(self._env, child)
        child.identifier = _Identifier.COGROUP
        child.parent = self._info
        child.other = other_set._info
        self._info.children.append(child)
        return child_set

예제 #57

0

파일 보기

    def write_csv(self,
                  path,
                  line_delimiter="\n",
                  field_delimiter=',',
                  write_mode=WriteMode.NO_OVERWRITE):
        """
        Writes a Tuple DataSet as a CSV file to the specified location.

        Note: Only a Tuple DataSet can written as a CSV file.
        :param path: The path pointing to the location the CSV file is written to.
        :param write_mode: OutputFormat.WriteMode value, indicating whether files should be overwritten
        """
        child = OperationInfo()
        child.identifier = _Identifier.SINK_CSV
        child.path = path
        child.parent = self._info
        child.delimiter_field = field_delimiter
        child.delimiter_line = line_delimiter
        child.write_mode = write_mode
        self._info.sinks.append(child)
        self._env._sinks.append(child)

예제 #58

0

파일 보기

파일: DataSet.py 프로젝트: tarunnar/pyflink

    def sort_group(self, field, order):
        """
        Sorts Tuple elements within a group on the specified field in the specified Order.

        Note: Only groups of Tuple elements can be sorted.
        Groups can be sorted by multiple fields by chaining sort_group() calls.

        :param field:The Tuple field on which the group is sorted.
        :param order: The Order in which the specified Tuple field is sorted. See DataSet.Order.
        :return:A SortedGrouping with specified order of group element.
        """
        child = OperationInfo()
        child_set = SortedGrouping(self._env, child, self._child_chain)
        child.identifier = _Identifier.SORT
        child.parent = self._info
        child.field = field
        child.order = order
        self._info.children.append(child)
        self._child_chain.append(child)
        self._env._sets.append(child)
        return child_set

예제 #59

0

파일 보기

파일: DataSet.py 프로젝트: SanthoshPoudapally/flink

    def group_by(self, *keys):
        """
        Groups a Tuple DataSet using field position keys.
        Note: Field position keys only be specified for Tuple DataSets.
        The field position keys specify the fields of Tuples on which the DataSet is grouped.
        This method returns an UnsortedGrouping on which one of the following grouping transformation can be applied.
        sort_group() to get a SortedGrouping.
        reduce() to apply a Reduce transformation.
        group_reduce() to apply a GroupReduce transformation.

        :param keys: One or more field positions on which the DataSet will be grouped.
        :return:A Grouping on which a transformation needs to be applied to obtain a transformed DataSet.
        """
        child = OperationInfo()
        child_chain = []
        child_set = UnsortedGrouping(self._env, child, child_chain)
        child.identifier = _Identifier.GROUP
        child.parent = self._info
        child.keys = keys
        child_chain.append(child)
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set

예제 #60

0

파일 보기

    def group_by(self, *keys):
        """
        Groups a Tuple DataSet using field position keys.
        Note: Field position keys only be specified for Tuple DataSets.
        The field position keys specify the fields of Tuples on which the DataSet is grouped.
        This method returns an UnsortedGrouping on which one of the following grouping transformation can be applied.
        sort_group() to get a SortedGrouping.
        reduce() to apply a Reduce transformation.
        group_reduce() to apply a GroupReduce transformation.

        :param keys: One or more field positions on which the DataSet will be grouped.
        :return:A Grouping on which a transformation needs to be applied to obtain a transformed DataSet.
        """
        child = OperationInfo()
        child_chain = []
        child_set = UnsortedGrouping(self._env, child, child_chain)
        child.identifier = _Identifier.GROUP
        child.parent = self._info
        child.keys = keys
        child_chain.append(child)
        self._info.children.append(child)
        self._env._sets.append(child)
        return child_set