Python DataFrame.partition_by_hash 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: cudf.dataframe.dataframe

클래스/타입: DataFrame

메소드/함수: partition_by_hash

hotexamples.com에서의 예제들: 2

Python DataFrame.partition_by_hash - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 cudf.dataframe.dataframe.DataFrame.partition_by_hash에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

DataFrame(30)

from_pandas(13)

from_arrow(8)

set_index(7)

index(6)

to_string(6)

groupby(5)

add_column(3)

columns(2)

copy(2)

_concat(2)

as_gpu_matrix(2)

partition_by_hash(2)

to_pandas(2)

drop_column(2)

assign(1)

drop(1)

as_matrix(1)

hash_columns(1)

_index(1)

take(1)

예제 #1

파일 보기

def test_dataframe_hash_partition(nrows, nparts, nkeys):
    np.random.seed(123)
    gdf = DataFrame()
    keycols = []
    for i in range(nkeys):
        keyname = 'key{}'.format(i)
        gdf[keyname] = np.random.randint(0, 7 - i, nrows)
        keycols.append(keyname)
    gdf['val1'] = np.random.randint(0, nrows * 2, nrows)

    got = gdf.partition_by_hash(keycols, nparts=nparts)
    # Must return a list
    assert isinstance(got, list)
    # Must have correct number of partitions
    assert len(got) == nparts
    # All partitions must be DataFrame type
    assert all(isinstance(p, DataFrame) for p in got)
    # Check that all partitions have unique keys
    part_unique_keys = set()
    for p in got:
        if len(p):
            # Take rows of the keycolums and build a set of the key-values
            unique_keys = set(map(tuple, p.as_matrix(columns=keycols)))
            # Ensure that none of the key-values have occurred in other groups
            assert not (unique_keys & part_unique_keys)
            part_unique_keys |= unique_keys
    assert len(part_unique_keys)

예제 #2

파일 보기

def test_dataframe_hash_partition_masked_value(nrows):
    gdf = DataFrame()
    gdf['key'] = np.arange(nrows)
    gdf['val'] = np.arange(nrows) + 100
    bitmask = utils.random_bitmask(nrows)
    bytemask = utils.expand_bits_to_bytes(bitmask)
    gdf['val'] = gdf['val'].set_mask(bitmask)
    parted = gdf.partition_by_hash(['key'], nparts=3)
    # Verify that the valid mask is correct
    for p in parted:
        df = p.to_pandas()
        for row in df.itertuples():
            valid = bool(bytemask[row.key])
            expected_value = row.key + 100 if valid else -1
            got_value = row.val
            assert expected_value == got_value