Python DataFrame.to_parquet示例

编程语言: Python

命名空间/包名称: mars.dataframe

类/类型: DataFrame

方法/功能: to_parquet

hotexamples.com的示例: 2

Python DataFrame.to_parquet - 已找到2个示例。这些是从开源项目中提取的最受好评的mars.dataframe.DataFrame.to_parquet现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

DataFrame(27)

sort_values(6)

execute(4)

quantile(3)

sort_index(3)

to_csv(3)

corr(2)

corrwith(2)

dot(2)

to_parquet(2)

to_vineyard(2)

to_sql(1)

示例#1

显示文件

def test_to_parquet_fast_parquet_execution():
    raw = pd.DataFrame({
        'col1': np.random.rand(100),
        'col2': np.arange(100),
        'col3': np.random.choice(['a', 'b', 'c'], (100, )),
    })
    df = DataFrame(raw, chunk_size=33)

    with tempfile.TemporaryDirectory() as base_path:
        # test fastparquet
        path = os.path.join(base_path, 'out-fastparquet-*.parquet')
        df.to_parquet(path, engine='fastparquet', compression='gzip').execute()

示例#2

显示文件

文件： test_datastore_execute.py 项目： timgates42/mars

    def testToParquetArrowExecution(self):
        raw = pd.DataFrame({
            'col1': np.random.rand(100),
            'col2': np.arange(100),
            'col3': np.random.choice(['a', 'b', 'c'], (100, )),
        })
        df = DataFrame(raw, chunk_size=33)

        with tempfile.TemporaryDirectory() as base_path:
            # DATAFRAME TESTS
            path = os.path.join(base_path, 'out-*.parquet')
            r = df.to_parquet(path)
            self.executor.execute_dataframe(r)

            read_df = md.read_parquet(path)
            result = self.executor.execute_dataframe(read_df, concat=True)[0]
            result = result.sort_index()
            pd.testing.assert_frame_equal(result, raw)

            read_df = md.read_parquet(path)
            result = self.executor.execute_dataframe(read_df, concat=True)[0]
            result = result.sort_index()
            pd.testing.assert_frame_equal(result, raw)

            # test read_parquet then to_parquet
            read_df = md.read_parquet(path)
            r = read_df.to_parquet(path)
            self.executor.execute_dataframes([r])

            # test partition_cols
            path = os.path.join(base_path, 'out-partitioned')
            r = df.to_parquet(path, partition_cols=['col3'])
            self.executor.execute_dataframe(r)

            read_df = md.read_parquet(path)
            result = self.executor.execute_dataframe(read_df, concat=True)[0]
            result['col3'] = result['col3'].astype('object')
            pd.testing.assert_frame_equal(
                result.sort_values('col1').reset_index(drop=True),
                raw.sort_values('col1').reset_index(drop=True))