Python Dataframe.from_schema_rdd Exemples

Langage de programmation: Python

Espace de nommage/Pack: sparklingpandas.dataframe

Class/Type: Dataframe

Méthode/Fonction: from_schema_rdd

Exemples au hotexamples.com: 2

Python Dataframe.from_schema_rdd - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de sparklingpandas.dataframe.Dataframe.from_schema_rdd extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

fromDataFrameRDD(10)

from_spark_rdd(4)

from_schema_rdd(2)

Méthodes fréquemment utilisées

fromDataFrameRDD (10)

from_spark_rdd (4)

from_schema_rdd (2)

Exemple #1

0

Afficher le fichier

Fichier : groupby.py Projet : asaf-erlich/sparklingpandas

def _use_aggregation(self, agg, columns=None): """Compute the result using the aggregation function provided. The aggregation name must also be provided so we can strip of the extra name that Spark SQL adds.""" if not columns: columns = self._columns from pyspark.sql import functions as F aggs = map(lambda column: agg(column).alias(column), self._columns) aggRdd = self._grouped_spark_sql.agg(*aggs) df = Dataframe.from_schema_rdd(aggRdd, self._by) return df

Exemple #2

0

Afficher le fichier

Fichier : pcontext.py Projet : asaf-erlich/sparklingpandas

def from_pd_data_frame(self, local_df): """Make a distributed dataframe from a local dataframe. The intend use is for testing. Note: dtypes are re-infered, so they may not match.""" def frame_to_rows(frame): """Convert a Panda's DataFrame into Spark SQL Rows""" # TODO: Convert to row objects directly? return [r.tolist() for r in frame.to_records()] schema = list(local_df.columns) index_names = list(local_df.index.names) index_names = _normalize_index_names(index_names) schema = index_names + schema rows = self.spark_ctx.parallelize(frame_to_rows(local_df)) sp_df = Dataframe.from_schema_rdd( self.sql_ctx.createDataFrame( rows, schema=schema, # Look at all the rows, should be ok since coming from # a local dataset samplingRatio=1)) sp_df._index_names = index_names return sp_df