Python DataFrame.partition_by_hash Exemples

Langage de programmation: Python

Espace de nommage/Pack: pygdf.dataframe

Class/Type: DataFrame

Méthode/Fonction: partition_by_hash

Exemples au hotexamples.com: 2

Python DataFrame.partition_by_hash - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de pygdf.dataframe.DataFrame.partition_by_hash extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

DataFrame(30)

from_pandas(12)

to_pandas(11)

set_index(7)

one_hot_encoding(5)

groupby(5)

label_encoding(3)

to_string(2)

partition_by_hash(2)

to_records(2)

as_matrix(2)

as_gpu_matrix(2)

query(2)

sort_values(1)

set_tdf(1)

assign(1)

from_records(1)

get_tdf(1)

nsmallest(1)

nlargest(1)

merge(1)

join(1)

hash_columns(1)

copy(1)

concat(1)

Méthodes fréquemment utilisées

DataFrame (30)

from_pandas (12)

to_pandas (11)

set_index (7)

one_hot_encoding (5)

groupby (5)

label_encoding (3)

to_string (2)

partition_by_hash (2)

to_records (2)

Méthodes fréquemment utilisées

as_matrix (2)

as_gpu_matrix (2)

query (2)

sort_values (1)

set_tdf (1)

assign (1)

from_records (1)

get_tdf (1)

nsmallest (1)

nlargest (1)

merge (1)

join (1)

hash_columns (1)

copy (1)

concat (1)

Méthodes fréquemment utilisées

merge (1)

join (1)

hash_columns (1)

copy (1)

concat (1)

Exemple #1

0

Afficher le fichier

def test_dataframe_hash_partition(nrows, nparts, nkeys): np.random.seed(123) gdf = DataFrame() keycols = [] for i in range(nkeys): keyname = 'key{}'.format(i) gdf[keyname] = np.random.randint(0, 7 - i, nrows) keycols.append(keyname) gdf['val1'] = np.random.randint(0, nrows * 2, nrows) got = gdf.partition_by_hash(keycols, nparts=nparts) # Must return a list assert isinstance(got, list) # Must have correct number of partitions assert len(got) == nparts # All partitions must be DataFrame type assert all(isinstance(p, DataFrame) for p in got) # Check that all partitions have unique keys part_unique_keys = set() for p in got: if len(p): # Take rows of the keycolums and build a set of the key-values unique_keys = set(map(tuple, p.as_matrix(columns=keycols))) # Ensure that none of the key-values have occurred in other groups assert not (unique_keys & part_unique_keys) part_unique_keys |= unique_keys assert len(part_unique_keys)

Exemple #2

0

Afficher le fichier

def test_dataframe_hash_partition_masked_value(nrows): gdf = DataFrame() gdf['key'] = np.arange(nrows) gdf['val'] = np.arange(nrows) + 100 bitmask = utils.random_bitmask(nrows) bytemask = utils.expand_bits_to_bytes(bitmask) gdf['val'] = gdf['val'].set_mask(bitmask) parted = gdf.partition_by_hash(['key'], nparts=3) # Verify that the valid mask is correct for p in parted: df = p.to_pandas() for row in df.itertuples(): valid = bool(bytemask[row.key]) expected_value = row.key + 100 if valid else -1 got_value = row.val assert expected_value == got_value