Python Columnの例

プログラミング言語: Python

名前空間/パッケージ名: pyspark.sql.dataframe

クラス/型: Column

hotexamples.comのコード掲載数: 17

Python Column - 17件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのpyspark.sql.dataframe.Columnの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Column(17)

よく使われるメソッド

Column (17)

コード例 #1

ファイルを表示

ファイル: functions.py プロジェクト: qiming82/spark

def coalesce(*cols):
    """Returns the first column that is not null.

    >>> cDf = sqlContext.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
    >>> cDf.show()
    +----+----+
    |   a|   b|
    +----+----+
    |null|null|
    |   1|null|
    |null|   2|
    +----+----+

    >>> cDf.select(coalesce(cDf["a"], cDf["b"])).show()
    +-------------+
    |Coalesce(a,b)|
    +-------------+
    |         null|
    |            1|
    |            2|
    +-------------+

    >>> cDf.select('*', coalesce(cDf["a"], lit(0.0))).show()
    +----+----+---------------+
    |   a|   b|Coalesce(a,0.0)|
    +----+----+---------------+
    |null|null|            0.0|
    |   1|null|            1.0|
    |null|   2|            0.0|
    +----+----+---------------+
    """
    sc = SparkContext._active_spark_context
    jc = sc._jvm.functions.coalesce(_to_seq(sc, cols, _to_java_column))
    return Column(jc)

コード例 #2

ファイルを表示

ファイル: functions.py プロジェクト: yyzdtc2009/spark

 def _(col1, col2):
     sc = SparkContext._active_spark_context
     # users might write ints for simplicity. This would throw an error on the JVM side.
     jc = getattr(sc._jvm.functions, name)(
         col1._jc if isinstance(col1, Column) else float(col1),
         col2._jc if isinstance(col2, Column) else float(col2))
     return Column(jc)

コード例 #3

ファイルを表示

ファイル: functions.py プロジェクト: qiming82/spark

def rand(seed=None):
    """Generates a random column with i.i.d. samples from U[0.0, 1.0].
    """
    sc = SparkContext._active_spark_context
    if seed:
        jc = sc._jvm.functions.rand(seed)
    else:
        jc = sc._jvm.functions.rand()
    return Column(jc)

コード例 #4

ファイルを表示

ファイル: functions.py プロジェクト: qiming82/spark

def randn(seed=None):
    """Generates a column with i.i.d. samples from the standard normal distribution.
    """
    sc = SparkContext._active_spark_context
    if seed:
        jc = sc._jvm.functions.randn(seed)
    else:
        jc = sc._jvm.functions.randn()
    return Column(jc)

コード例 #5

ファイルを表示

ファイル: functions.py プロジェクト: swapniltadasare/spark

def sparkPartitionId():
    """A column for partition ID of the Spark task.

    Note that this is indeterministic because it depends on data partitioning and task scheduling.

    >>> df.repartition(1).select(sparkPartitionId().alias("pid")).collect()
    [Row(pid=0), Row(pid=0)]
    """
    sc = SparkContext._active_spark_context
    return Column(sc._jvm.functions.sparkPartitionId())

コード例 #6

ファイルを表示

ファイル: mathfunctions.py プロジェクト: yuanruq/spark

 def _(col1, col2):
     sc = SparkContext._active_spark_context
     # users might write ints for simplicity. This would throw an error on the JVM side.
     if type(col1) is int:
         col1 = col1 * 1.0
     if type(col2) is int:
         col2 = col2 * 1.0
     jc = getattr(sc._jvm.mathfunctions,
                  name)(col1._jc if isinstance(col1, Column) else col1,
                        col2._jc if isinstance(col2, Column) else col2)
     return Column(jc)

コード例 #7

ファイルを表示

ファイル: functions.py プロジェクト: bopopescu/spark-14

def approxCountDistinct(col, rsd=None):
    """Returns a new :class:`Column` for approximate distinct count of ``col``.

    >>> df.agg(approxCountDistinct(df.age).alias('c')).collect()
    [Row(c=2)]
    """
    sc = SparkContext._active_spark_context
    if rsd is None:
        jc = sc._jvm.functions.approxCountDistinct(_to_java_column(col))
    else:
        jc = sc._jvm.functions.approxCountDistinct(_to_java_column(col), rsd)
    return Column(jc)

コード例 #8

ファイルを表示

ファイル: functions.py プロジェクト: qiming82/spark

def countDistinct(col, *cols):
    """Returns a new :class:`Column` for distinct count of ``col`` or ``cols``.

    >>> df.agg(countDistinct(df.age, df.name).alias('c')).collect()
    [Row(c=2)]

    >>> df.agg(countDistinct("age", "name").alias('c')).collect()
    [Row(c=2)]
    """
    sc = SparkContext._active_spark_context
    jc = sc._jvm.functions.countDistinct(_to_java_column(col), _to_seq(sc, cols, _to_java_column))
    return Column(jc)

コード例 #9

ファイルを表示

ファイル: functions.py プロジェクト: bopopescu/spark-14

def countDistinct(col, *cols):
    """Returns a new :class:`Column` for distinct count of ``col`` or ``cols``.

    >>> df.agg(countDistinct(df.age, df.name).alias('c')).collect()
    [Row(c=2)]

    >>> df.agg(countDistinct("age", "name").alias('c')).collect()
    [Row(c=2)]
    """
    sc = SparkContext._active_spark_context
    jcols = ListConverter().convert([_to_java_column(c) for c in cols],
                                    sc._gateway._gateway_client)
    jc = sc._jvm.functions.countDistinct(_to_java_column(col),
                                         sc._jvm.PythonUtils.toSeq(jcols))
    return Column(jc)

コード例 #10

ファイルを表示

ファイル: functions.py プロジェクト: qiming82/spark

def array(*cols):
    """Creates a new array column.

    :param cols: list of column names (string) or list of :class:`Column` expressions that have
        the same data type.

    >>> df.select(array('age', 'age').alias("arr")).collect()
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
    >>> df.select(array([df.age, df.age]).alias("arr")).collect()
    [Row(arr=[2, 2]), Row(arr=[5, 5])]
    """
    sc = SparkContext._active_spark_context
    if len(cols) == 1 and isinstance(cols[0], (list, set)):
        cols = cols[0]
    jc = sc._jvm.functions.array(_to_seq(sc, cols, _to_java_column))
    return Column(jc)

コード例 #11

ファイルを表示

ファイル: functions.py プロジェクト: qiming82/spark

def struct(*cols):
    """Creates a new struct column.

    :param cols: list of column names (string) or list of :class:`Column` expressions
        that are named or aliased.

    >>> df.select(struct('age', 'name').alias("struct")).collect()
    [Row(struct=Row(age=2, name=u'Alice')), Row(struct=Row(age=5, name=u'Bob'))]
    >>> df.select(struct([df.age, df.name]).alias("struct")).collect()
    [Row(struct=Row(age=2, name=u'Alice')), Row(struct=Row(age=5, name=u'Bob'))]
    """
    sc = SparkContext._active_spark_context
    if len(cols) == 1 and isinstance(cols[0], (list, set)):
        cols = cols[0]
    jc = sc._jvm.functions.struct(_to_seq(sc, cols, _to_java_column))
    return Column(jc)

コード例 #12

ファイルを表示

ファイル: functions.py プロジェクト: swapniltadasare/spark

def monotonicallyIncreasingId():
    """A column that generates monotonically increasing 64-bit integers.

    The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive.
    The current implementation puts the partition ID in the upper 31 bits, and the record number
    within each partition in the lower 33 bits. The assumption is that the data frame has
    less than 1 billion partitions, and each partition has less than 8 billion records.

    As an example, consider a [[DataFrame]] with two partitions, each with 3 records.
    This expression would return the following IDs:
    0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.

    >>> df0 = sc.parallelize(range(2), 2).mapPartitions(lambda x: [(1,), (2,), (3,)]).toDF(['col1'])
    >>> df0.select(monotonicallyIncreasingId().alias('id')).collect()
    [Row(id=0), Row(id=1), Row(id=2), Row(id=8589934592), Row(id=8589934593), Row(id=8589934594)]
    """
    sc = SparkContext._active_spark_context
    return Column(sc._jvm.functions.monotonicallyIncreasingId())

コード例 #13

ファイルを表示

ファイル: functions.py プロジェクト: qiming82/spark

def when(condition, value):
    """Evaluates a list of conditions and returns one of multiple possible result expressions.
    If :func:`Column.otherwise` is not invoked, None is returned for unmatched conditions.

    :param condition: a boolean :class:`Column` expression.
    :param value: a literal value, or a :class:`Column` expression.

    >>> df.select(when(df['age'] == 2, 3).otherwise(4).alias("age")).collect()
    [Row(age=3), Row(age=4)]

    >>> df.select(when(df.age == 2, df.age + 1).alias("age")).collect()
    [Row(age=3), Row(age=None)]
    """
    sc = SparkContext._active_spark_context
    if not isinstance(condition, Column):
        raise TypeError("condition should be a Column")
    v = value._jc if isinstance(value, Column) else value
    jc = sc._jvm.functions.when(condition._jc, v)
    return Column(jc)

コード例 #14

ファイルを表示

ファイル: functions.py プロジェクト: bopopescu/spark-14

 def _(col):
     sc = SparkContext._active_spark_context
     jc = getattr(sc._jvm.functions,
                  name)(col._jc if isinstance(col, Column) else col)
     return Column(jc)

コード例 #15

ファイルを表示

ファイル: functions.py プロジェクト: bopopescu/spark-14

 def __call__(self, *cols):
     sc = SparkContext._active_spark_context
     jcols = ListConverter().convert([_to_java_column(c) for c in cols],
                                     sc._gateway._gateway_client)
     jc = self._judf.apply(sc._jvm.PythonUtils.toSeq(jcols))
     return Column(jc)

コード例 #16

ファイルを表示

ファイル: functions.py プロジェクト: swapniltadasare/spark

 def __call__(self, *cols):
     sc = SparkContext._active_spark_context
     jc = self._judf.apply(_to_seq(sc, cols, _to_java_column))
     return Column(jc)

コード例 #17

ファイルを表示

 def _(col):
     spark_ctx = SparkContext._active_spark_context
     java_ctx = (getattr(
         spark_ctx._jvm.com.sparklingpandas.functions,
         name)(col._java_ctx if isinstance(col, Column) else col))
     return Column(java_ctx)