Python DataFrame.joinの例

プログラミング言語: Python

名前空間/パッケージ名: cudf

クラス/型: DataFrame

メソッド/関数: join

hotexamples.comのコード掲載数: 1

Python DataFrame.join - 1件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのcudf.DataFrame.joinの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

DataFrame(30)

from_pandas(30)

_from_data(16)

to_pandas(14)

_from_table(10)

drop(10)

merge(7)

copy(7)

take(5)

from_gpu_matrix(5)

equals(4)

one_hot_encoding(4)

set_index(4)

apply_chunks(4)

add_column(4)

columns(3)

label_encoding(3)

name(3)

dropna(3)

query(3)

sort_values(2)

to_records(2)

_concat(2)

from_records(2)

append(2)

apply_rows(2)

_apply(1)

serialize(1)

to_parquet(1)

_apply_support_method(1)

to_dlpack(1)

to_cupy(1)

to_arrow(1)

scatter_by_map(1)

select_dtypes(1)

join(1)

repeat(1)

argsort(1)

as_gpu_matrix(1)

nsmallest(1)

nlargest(1)

drop_duplicates(1)

from_arrow(1)

memory_usage(1)

insert(1)

コード例 #1

ファイルを表示

ファイル: process_routing.py プロジェクト: cjber/ahah

def get_buffers(
    poi: cudf.DataFrame,
    postcodes: cudf.DataFrame,
    k: int,
) -> cudf.DataFrame:
    """
    Estimate buffer sizes required to capture each necessary road node
    Calculates k nearest neighbours for each POI to each road node. Finds
    each node that is considered a neighbour to a poi `k*len(poi)`. Buffers
    are taken as the distance to the further neighbour and all nodes associated with
    each POI are saved.
    Parameters
    ----------
    poi : cudf.DataFrame
        Dataframe of all POIs
    postcodes : cudf.DataFrame
        Dataframe of postcodes
    k : int
        Number of neigbours to use
    Returns
    -------
    cudf.DataFrame:
        POI dataframe including buffer and column with list of nodes
    """
    nbrs = NearestNeighbors(n_neighbors=k,
                            output_type="cudf",
                            algorithm="brute").fit(poi[["easting",
                                                        "northing"]])
    distances, indices = nbrs.kneighbors(postcodes[["easting", "northing"]])

    poi_nn = (
        postcodes.join(indices)[["node_id"] +
                                indices.columns.tolist()].set_index("node_id").
        stack().rename("poi_idx").reset_index().rename(columns={
            "level_0": "pc_node"
        }).drop("level_1",
                axis=1).groupby("poi_idx").agg(list).join(poi, how="right"))

    # retain only unique postcode ids
    poi_nn["pc_node"] = (poi_nn["pc_node"].to_pandas().apply(
        lambda row: list(set(row)) if row is not None else row))

    distances = distances.stack().rename("dist").reset_index().drop("level_1",
                                                                    axis=1)
    indices = indices.stack().rename("ind").reset_index().drop("level_1",
                                                               axis=1)

    poi_nodes = (poi_nn[[
        "node_id"
    ]].iloc[indices["ind"].values]["node_id"].reset_index(drop=True))
    buffers = cudf.DataFrame({
        "node_id": poi_nodes,
        "buffer": distances["dist"].values
    })
    buffers = buffers.sort_values("buffer",
                                  ascending=False).drop_duplicates("node_id")
    buffers["buffer"] = buffers["buffer"].astype("int")

    # this will drop rows that did not appear in the KNN i.e unneeded poi
    return (poi_nn.merge(buffers, on="node_id",
                         how="left").dropna().drop_duplicates("node_id"))