Python dataframe.drop_duplicates Examples

Programming Language: Python

Namespace/Package Name: dask

Class/Type: dataframe

Method/Function: drop_duplicates

Examples at hotexamples.com: 2

Python dataframe.drop_duplicates - 2 examples found. These are the top rated real world Python examples of dask.dataframe.drop_duplicates extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

groupby(7)

select_dtypes(4)

compute(3)

set_index(3)

dropna(3)

reset_index(3)

map_partitions(3)

append(2)

isnull(2)

drop(2)

drop_duplicates(2)

join(1)

get_partition(1)

reindex(1)

fillna(1)

sample(1)

to_delayed(1)

Example #1

Show file

File: transformations.py Project: dansilva11/candlestick-data-pipeline

def drop_duplicate_rows(data: dd = None,
                        subset: List[str] = None,
                        keep: str = None) -> dd:
    """
    Drop rows containing duplicate data for the specified subset of columns
    :param data: dask dataframe
    :param subset: list of column names
    :param keep: which duplicate to keep
    :return: modified dask dataframe
    """
    return data.drop_duplicates(subset=subset, keep=keep)

Example #2

Show file

    def transform(self, X: dd, y=None):
        """
        Remove duplicated rows

        Args:
            X (dd): Dataframe to be processed
            y (dd, optional): Target. Defaults to None.

        Returns:
            (dd): Dataframe with rows removed
        """
        return X.drop_duplicates(subset=self.subset)