Пример #1
0
 def landuse_agg_wrapper():
     mu_data_hill, mu_data_other, ac_data, ac_data_wt = data.aggregate_landuses(node_data,
                                                                                edge_data,
                                                                                node_edge_map,
                                                                                data_map,
                                                                                distances,
                                                                                betas,
                                                                                mixed_use_hill_keys=np.array([0, 1]),
                                                                                landuse_encodings=landuse_encodings,
                                                                                qs=qs,
                                                                                angular=False)
Пример #2
0
    def compute_landuses(self,
                         landuse_labels: list | tuple | np.ndarray,
                         mixed_use_keys: list | tuple = None,
                         accessibility_keys: list | tuple = None,
                         cl_disparity_wt_matrix: list | tuple
                         | np.ndarray = None,
                         qs: list | tuple | np.ndarray = None,
                         jitter_scale: float = 0.0,
                         angular: bool = False):
        """
        This method wraps the underlying `numba` optimised functions for aggregating and computing various mixed-use and
        land-use accessibility measures. These are computed simultaneously for any required combinations of measures
        (and distances), which can have significant speed implications. Situations requiring only a single measure can
        instead make use of the simplified [`DataLayer.hill_diversity`](#datalayerhill_diversity),
        [`DataLayer.hill_branch_wt_diversity`](#datalayerhill_branch_wt_diversity), and
        [`DataLayer.compute_accessibilities`](#datalayercompute_accessibilities) methods.

        See the accompanying paper on `arXiv` for additional information about methods for computing mixed-use measures
        at the pedestrian scale.

        import ArXivLink from '../../src/components/ArXivLink.vue'

        <ArXivLink arXivLink='https://arxiv.org/abs/2106.14048'/>

        The data is aggregated and computed over the street network relative to the `Network Layer` nodes, with the
        implication that mixed-use and land-use accessibility aggregations are generated from the same locations as
        for centrality computations, which can therefore be correlated or otherwise compared. The outputs of the
        calculations are written to the corresponding node indices in the same `NetworkLayer.metrics` dictionary used
        for centrality methods, and will be categorised by the respective keys and parameters.

        For example, if `hill` and `shannon` mixed-use keys; `shops` and `factories` accessibility keys are computed on
        a `Network Layer` instantiated with 800m and 1600m distance thresholds, then the dictionary would assume the
        following structure:

        ```python
        NetworkLayer.metrics = {
            'mixed_uses': {
                # note that hill measures have q keys
                'hill': {
                    # here, q=0
                    0: {
                        800: [...],
                        1600: [...]
                    },
                    # here, q=1
                    1: {
                        800: [...],
                        1600: [...]
                    }
                },
                # non-hill measures do not have q keys
                'shannon': {
                    800: [...],
                    1600: [...]
                }
            },
            'accessibility': {
                # accessibility keys are computed in both weighted and unweighted forms
                'weighted': {
                    'shops': {
                        800: [...],
                        1600: [...]
                    },
                    'factories': {
                        800: [...],
                        1600: [...]
                    }
                },
                'non_weighted': {
                    'shops': {
                        800: [...],
                        1600: [...]
                    },
                    'factories': {
                        800: [...],
                        1600: [...]
                    }
                }
            }
        }
        ```

        Parameters
        ----------
        landuse_labels
            A set of land-use labels corresponding to the length and order of the data points. The labels should
            correspond to descriptors from the land-use schema, such as "retail" or "commercial". This parameter is only
            required if computing mixed-uses or land-use accessibilities.
        mixed_use_keys
            An optional list of strings describing which mixed-use metrics to compute, containing any combination of
            `key` values from the following table, by default None. See **Notes** for additional information.
        accessibility_keys
            An optional `list` or `tuple` of land-use classifications for which to calculate accessibilities. The keys
            should be selected from the same land-use schema used for the `landuse_labels` parameter, e.g. "retail". The
            calculations will be performed in both `weighted` and `non_weighted` variants, by default None.
        cl_disparity_wt_matrix
            A pairwise `NxN` disparity matrix numerically describing the degree of disparity between any pair of
            distinct land-uses. This parameter is only required if computing mixed-uses using `hill_pairwise_disparity`
            or `raos_pairwise_disparity`.  The number and order of land-uses should match those implicitly generated by
            [`encode_categorical`](#encode_categorical), by default None.
        qs
            The values of `q` for which to compute Hill diversity. This parameter is only required if computing one of
            the Hill diversity mixed-use measures, by default None.
        jitter_scale
            The scale of random jitter to add to shortest path calculations, useful for situations with highly
            rectilinear grids. `jitter_scale` is passed to the `scale` parameter of `np.random.normal`. Default of zero.
        angular
            Whether to use a simplest-path heuristic in-lieu of a shortest-path heuristic when calculating aggregations
            and distances, by default False

        Notes
        -----
        | key | formula | notes |
        |-----|:-------:|-------|
        | hill | $\scriptstyle\big(\sum_{i}^{S}p_{i}^q\big)^{1/(1-q)}\ q\geq0,\ q\neq1 \\ \scriptstyle lim_{q\to1}\
        exp\big(-\sum_{i}^{S}\ p_{i}\ log\ p_{i}\big)$ | Hill diversity: this is the preferred form of diversity
        metric because it adheres to the replication principle and uses units of effective species instead of measures
        of information or uncertainty. The `q` parameter controls the degree of emphasis on the _richness_ of species as
        opposed to the _balance_ of species. Over-emphasis on balance can be misleading in an urban context, for which
        reason research finds support for using `q=0`: this reduces to a simple count of distinct land-uses.|
        | hill_branch_wt | $\scriptstyle\big[\sum_{i}^{S}d_{i}\big(\frac{p_{i}}{\bar{T}}\big)^{q} \big]^{1/(1-q)} \\
        \scriptstyle\bar{T} = \sum_{i}^{S}d_{i}p_{i}$ | This is a distance-weighted variant of Hill Diversity based
        on the distances from the point of computation to the nearest example of a particular land-use. It therefore
        gives a locally representative indication of the intensity of mixed-uses. $d_{i}$ is a negative exponential
        function where $\beta$ controls the strength of the decay. ($\beta$ is provided by the `Network Layer`, see
        [`distance_from_beta`](/metrics/networks/#distance_from_beta).)|
        | hill_pairwise_wt | $\scriptstyle\big[ \sum_{i}^{S} \sum_{j\neq{i}}^{S} d_{ij} \big(  \frac{p_{i} p_{j}}{Q}
        \big)^{q} \big]^{1/(1-q)} \\ \scriptstyle Q = \sum_{i}^{S} \sum_{j\neq{i}}^{S} d_{ij} p_{i} p_{j}$ | This is a
        pairwise-distance-weighted variant of Hill Diversity based on the respective distances between the closest
        examples of the pairwise distinct land-use combinations as routed through the point of computation.
        $d_{ij}$ represents a negative exponential function where $\beta$ controls the strength of the decay.
        ($\beta$ is provided by the `Network Layer`, see
        [`distance_from_beta`](/metrics/networks/#distance_from_beta).)|
        | hill_pairwise_disparity | $\scriptstyle\big[ \sum_{i}^{S} \sum_{j\neq{i}}^{S} w_{ij} \big(  \frac{p_{i}
        p_{j}}{Q} \big)^{q} \big]^{1/(1-q)} \\ \scriptstyle Q = \sum_{i}^{S} \sum_{j\neq{i}}^{S} w_{ij} p_{i}
        p_{j}$ | This is a disparity-weighted variant of Hill Diversity based on the pairwise disparities between
        land-uses. This variant requires the use of a disparity matrix provided through the `cl_disparity_wt_matrix`
        parameter.|
        | shannon | $\scriptstyle -\sum_{i}^{S}\ p_{i}\ log\ p_{i}$ | Shannon diversity (or_information entropy_) is
        one of the classic diversity indices. Note that it is preferable to use Hill Diversity with `q=1`, which is
        effectively a transformation of Shannon diversity into units of effective species.|
        | gini_simpson | $\scriptstyle 1 - \sum_{i}^{S} p_{i}^2$ | Gini-Simpson is another classic diversity index.
        It can behave problematically because it does not adhere to the replication principle and places emphasis on the
        balance of species, which can be counter-productive for purposes of measuring mixed-uses. Note that where an
        emphasis on balance is desired, it is preferable to use Hill Diversity with `q=2`, which is effectively a
        transformation of Gini-Simpson diversity into units of effective species.|
        | raos_pairwise_disparity | $\scriptstyle \sum_{i}^{S} \sum_{j \neq{i}}^{S} d_{ij} p_{i} p_{j}$ | Rao diversity
        is a pairwise disparity measure and requires the use of a disparity matrix provided through the
        `cl_disparity_wt_matrix` parameter. It suffers from the same issues as Gini-Simpson. It is preferable to use
        disparity weighted Hill diversity with `q=2`.|

        :::tip Comment
        The available choices of land-use diversity measures may seem overwhelming. `hill_branch_wt` paired with `q=0`
        is generally the best choice for granular landuse data, or else `q=1` or `q=2` for increasingly crude landuse
        classifications schemas.
        :::

        A worked example:
        ```python
        from cityseer.metrics import networks, layers
        from cityseer.tools import mock, graphs

        # prepare a mock graph
        G = mock.mock_graph()
        G = graphs.nX_simple_geoms(G)

        # generate the network layer
        N = networks.NetworkLayerFromNX(G, distances=[200, 400, 800, 1600])

        # prepare a mock data dictionary
        data_dict = mock.mock_data_dict(G, random_seed=25)
        # prepare some mock land-use classifications
        landuses = mock.mock_categorical_data(len(data_dict), random_seed=25)

        # generate a data layer
        L = layers.DataLayerFromDict(data_dict)
        # assign to the network
        L.assign_to_network(N, max_dist=500)
        # compute some metrics - here we'll use the full interface, see below for simplified interfaces
        # FULL INTERFACE
        # ==============
        L.compute_landuses(landuse_labels=landuses,
                           mixed_use_keys=['hill'],
                           qs=[0, 1],
                           accessibility_keys=['c', 'd', 'e'])
        # note that the above measures can optionally be run individually using simplified interfaces, e.g.
        # SIMPLIFIED INTERFACES
        # =====================
        # L.hill_diversity(landuses, qs=[0])
        # L.compute_accessibilities(landuses, ['a', 'b'])

        # let's prepare some keys for accessing the computational outputs
        # distance idx: any of the distances with which the NetworkLayer was initialised
        distance_idx = 200
        # q index: any of the invoked q parameters
        q_idx = 0
        # a node idx
        node_idx = 0

        # the data is available at N.metrics
        print(N.metrics['mixed_uses']['hill'][q_idx][distance_idx][node_idx])
        # prints: 4.0
        print(N.metrics['accessibility']['weighted']['d'][distance_idx][node_idx])
        # prints: 0.019168843947614676
        print(N.metrics['accessibility']['non_weighted']['d'][distance_idx][node_idx])
        # prints: 1.0
        ```

        Note that the data can also be unpacked to a dictionary using [`NetworkLayer.metrics_to_dict`](/metrics/networks/#networklayermetrics_to_dict), or transposed to a `networkX` graph using [`NetworkLayer.to_networkX`](/metrics/networks/#networklayerto_networkx).

        :::danger Caution
        Be cognisant that mixed-use and land-use accessibility measures are sensitive to the classification schema that has been used. Meaningful comparisons from one location to another are only possible where the same schemas have been applied.
        :::
        """
        if self.Network is None:
            raise ValueError(
                'Assign this data layer to a network prior to computing mixed-uses or accessibilities.'
            )
        mixed_uses_options = [
            'hill', 'hill_branch_wt', 'hill_pairwise_wt',
            'hill_pairwise_disparity', 'shannon', 'gini_simpson',
            'raos_pairwise_disparity'
        ]
        # remember, most checks on parameter integrity occur in underlying method
        # so, don't duplicate here
        if len(landuse_labels) != len(self._data):
            raise ValueError(
                'The number of landuse labels should match the number of data points.'
            )
        # get the landuse encodings
        landuse_classes, landuse_encodings = encode_categorical(landuse_labels)
        # if necessary, check the disparity matrix
        if cl_disparity_wt_matrix is None:
            cl_disparity_wt_matrix = np.full((0, 0), np.nan)
        elif not isinstance(cl_disparity_wt_matrix, (list, tuple, np.ndarray)) or \
                cl_disparity_wt_matrix.ndim != 2 or \
                cl_disparity_wt_matrix.shape[0] != cl_disparity_wt_matrix.shape[1] or \
                len(cl_disparity_wt_matrix) != len(landuse_classes):
            raise TypeError(
                'Disparity weights must be a square pairwise NxN matrix in list, tuple, or numpy.ndarray form. '
                'The number of edge-wise elements should match the number of unique class labels.'
            )
        # warn if no qs provided
        if qs is None:
            qs = ()
        if isinstance(qs, (int, float)):
            qs = (qs)
        if not isinstance(qs, (list, tuple, np.ndarray)):
            raise TypeError(
                'Please provide a float, list, tuple, or numpy.ndarray of q values.'
            )
        # extrapolate the requested mixed use measures
        mu_hill_keys = []
        mu_other_keys = []
        if mixed_use_keys is not None:
            for mu in mixed_use_keys:
                if mu not in mixed_uses_options:
                    raise ValueError(
                        f'Invalid mixed-use option: {mu}. Must be one of {", ".join(mixed_uses_options)}.'
                    )
                idx = mixed_uses_options.index(mu)
                if idx < 4:
                    mu_hill_keys.append(idx)
                else:
                    mu_other_keys.append(idx - 4)
            if not checks.quiet_mode:
                logger.info(
                    f'Computing mixed-use measures: {", ".join(mixed_use_keys)}'
                )
        # figure out the corresponding indices for the landuse classes that are present in the dataset
        # these indices are passed as keys which will be matched against the integer landuse encodings
        acc_keys = []
        if accessibility_keys is not None:
            for ac_label in accessibility_keys:
                if ac_label not in landuse_classes:
                    logger.warning(
                        f'No instances of accessibility label: {ac_label} present in the data.'
                    )
                else:
                    acc_keys.append(landuse_classes.index(ac_label))
            if not checks.quiet_mode:
                logger.info(
                    f'Computing land-use accessibility for: {", ".join(accessibility_keys)}'
                )
        if not checks.quiet_mode:
            progress_proxy = ProgressBar(total=len(self.Network._node_data))
        else:
            progress_proxy = None
        # call the underlying method
        mixed_use_hill_data, mixed_use_other_data, accessibility_data, accessibility_data_wt = \
            data.aggregate_landuses(self.Network._node_data,
                                    self.Network._edge_data,
                                    self.Network._node_edge_map,
                                    self._data,
                                    distances=np.array(self.Network.distances),
                                    betas=np.array(self.Network.betas),
                                    landuse_encodings=np.array(landuse_encodings),
                                    qs=np.array(qs),
                                    mixed_use_hill_keys=np.array(mu_hill_keys),
                                    mixed_use_other_keys=np.array(mu_other_keys),
                                    accessibility_keys=np.array(acc_keys),
                                    cl_disparity_wt_matrix=np.array(cl_disparity_wt_matrix),
                                    jitter_scale=jitter_scale,
                                    angular=angular,
                                    progress_proxy=progress_proxy)
        if progress_proxy is not None:
            progress_proxy.close()
        # write the results to the Network's metrics dict
        # keys will check for pre-existing, whereas qs and distance keys will overwrite
        # unpack mixed use hill
        for mu_h_idx, mu_h_key in enumerate(mu_hill_keys):
            mu_h_label = mixed_uses_options[mu_h_key]
            if mu_h_label not in self.Network.metrics['mixed_uses']:
                self.Network.metrics['mixed_uses'][mu_h_label] = {}
            for q_idx, q_key in enumerate(qs):
                self.Network.metrics['mixed_uses'][mu_h_label][q_key] = {}
                for d_idx, d_key in enumerate(self.Network.distances):
                    self.Network.metrics['mixed_uses'][mu_h_label][q_key][d_key] = \
                        mixed_use_hill_data[mu_h_idx][q_idx][d_idx]
        # unpack mixed use other
        for mu_o_idx, mu_o_key in enumerate(mu_other_keys):
            mu_o_label = mixed_uses_options[mu_o_key + 4]
            if mu_o_label not in self.Network.metrics['mixed_uses']:
                self.Network.metrics['mixed_uses'][mu_o_label] = {}
            # no qs
            for d_idx, d_key in enumerate(self.Network.distances):
                self.Network.metrics['mixed_uses'][mu_o_label][
                    d_key] = mixed_use_other_data[mu_o_idx][d_idx]
        # unpack accessibility data
        for ac_idx, ac_code in enumerate(acc_keys):
            ac_label = landuse_classes[ac_code]  # ac_code is index of ac_label
            for k, ac_data in zip(['non_weighted', 'weighted'],
                                  [accessibility_data, accessibility_data_wt]):
                if ac_label not in self.Network.metrics['accessibility'][k]:
                    self.Network.metrics['accessibility'][k][ac_label] = {}
                for d_idx, d_key in enumerate(self.Network.distances):
                    self.Network.metrics['accessibility'][k][ac_label][
                        d_key] = ac_data[ac_idx][d_idx]
Пример #3
0
def test_aggregate_landuses_signatures(primal_graph):
    # generate node and edge maps
    node_uids, node_data, edge_data, node_edge_map = graphs.graph_maps_from_nX(primal_graph)
    # setup data
    data_dict = mock.mock_data_dict(primal_graph, random_seed=13)
    data_uids, data_map = layers.data_map_from_dict(data_dict)
    data_map = data.assign_to_network(data_map, node_data, edge_data, node_edge_map, 500)
    # set parameters
    betas = np.array([0.02, 0.01, 0.005, 0.0025])
    distances = networks.distance_from_beta(betas)
    qs = np.array([0, 1, 2])
    mock_categorical = mock.mock_categorical_data(len(data_map))
    landuse_classes, landuse_encodings = layers.encode_categorical(mock_categorical)
    # check that empty land_use encodings are caught
    with pytest.raises(ValueError):
        data.aggregate_landuses(node_data,
                                edge_data,
                                node_edge_map,
                                data_map,
                                distances,
                                betas,
                                mixed_use_hill_keys=np.array([0]))
    # check that unequal land_use encodings vs data map lengths are caught
    with pytest.raises(ValueError):
        data.aggregate_landuses(node_data,
                                edge_data,
                                node_edge_map,
                                data_map,
                                distances,
                                betas,
                                landuse_encodings=landuse_encodings[:-1],
                                mixed_use_other_keys=np.array([0]))
    # check that no provided metrics flags
    with pytest.raises(ValueError):
        data.aggregate_landuses(node_data,
                                edge_data,
                                node_edge_map,
                                data_map,
                                distances,
                                betas,
                                landuse_encodings=landuse_encodings)
    # check that missing qs flags
    with pytest.raises(ValueError):
        data.aggregate_landuses(node_data,
                                edge_data,
                                node_edge_map,
                                data_map,
                                distances,
                                betas,
                                mixed_use_hill_keys=np.array([0]),
                                landuse_encodings=landuse_encodings)
    # check that problematic mixed use and accessibility keys are caught
    for mu_h_key, mu_o_key, ac_key in [
        # negatives
        ([-1], [1], [1]),
        ([1], [-1], [1]),
        ([1], [1], [-1]),
        # out of range
        ([4], [1], [1]),
        ([1], [3], [1]),
        ([1], [1], [max(landuse_encodings) + 1]),
        # duplicates
        ([1, 1], [1], [1]),
        ([1], [1, 1], [1]),
        ([1], [1], [1, 1])]:
        with pytest.raises(ValueError):
            data.aggregate_landuses(node_data,
                                    edge_data,
                                    node_edge_map,
                                    data_map,
                                    distances,
                                    betas,
                                    landuse_encodings,
                                    qs=qs,
                                    mixed_use_hill_keys=np.array(mu_h_key),
                                    mixed_use_other_keys=np.array(mu_o_key),
                                    accessibility_keys=np.array(ac_key))
    for h_key, o_key in (([3], []), ([], [2])):
        # check that missing matrix is caught for disparity weighted indices
        with pytest.raises(ValueError):
            data.aggregate_landuses(node_data,
                                    edge_data,
                                    node_edge_map,
                                    data_map,
                                    distances,
                                    betas,
                                    landuse_encodings=landuse_encodings,
                                    qs=qs,
                                    mixed_use_hill_keys=np.array(h_key),
                                    mixed_use_other_keys=np.array(o_key))
        # check that non-square disparity matrix is caught
        mock_matrix = np.full((len(landuse_classes), len(landuse_classes)), 1)
        with pytest.raises(ValueError):
            data.aggregate_landuses(node_data,
                                    edge_data,
                                    node_edge_map,
                                    data_map,
                                    distances,
                                    betas,
                                    landuse_encodings=landuse_encodings,
                                    qs=qs,
                                    mixed_use_hill_keys=np.array(h_key),
                                    mixed_use_other_keys=np.array(o_key),
                                    cl_disparity_wt_matrix=mock_matrix[:-1])
Пример #4
0
def test_aggregate_landuses_categorical_components(primal_graph):
    # generate node and edge maps
    node_uids, node_data, edge_data, node_edge_map, = graphs.graph_maps_from_nX(primal_graph)
    # setup data
    data_dict = mock.mock_data_dict(primal_graph, random_seed=13)
    data_uids, data_map = layers.data_map_from_dict(data_dict)
    data_map = data.assign_to_network(data_map, node_data, edge_data, node_edge_map, 500)
    # set parameters
    betas = np.array([0.02, 0.01, 0.005, 0.0025])
    distances = networks.distance_from_beta(betas)
    qs = np.array([0, 1, 2])
    mock_categorical = mock.mock_categorical_data(len(data_map))
    landuse_classes, landuse_encodings = layers.encode_categorical(mock_categorical)
    mock_matrix = np.full((len(landuse_classes), len(landuse_classes)), 1)
    # set the keys - add shuffling to be sure various orders work
    hill_keys = np.arange(4)
    np.random.shuffle(hill_keys)
    non_hill_keys = np.arange(3)
    np.random.shuffle(non_hill_keys)
    ac_keys = np.array([1, 2, 5])
    np.random.shuffle(ac_keys)
    # generate
    mu_data_hill, mu_data_other, ac_data, ac_data_wt = data.aggregate_landuses(node_data,
                                                                               edge_data,
                                                                               node_edge_map,
                                                                               data_map,
                                                                               distances,
                                                                               betas,
                                                                               landuse_encodings=landuse_encodings,
                                                                               qs=qs,
                                                                               mixed_use_hill_keys=hill_keys,
                                                                               mixed_use_other_keys=non_hill_keys,
                                                                               accessibility_keys=ac_keys,
                                                                               cl_disparity_wt_matrix=mock_matrix,
                                                                               angular=False)
    # hill
    hill = mu_data_hill[np.where(hill_keys == 0)][0]
    hill_branch_wt = mu_data_hill[np.where(hill_keys == 1)][0]
    hill_pw_wt = mu_data_hill[np.where(hill_keys == 2)][0]
    hill_disp_wt = mu_data_hill[np.where(hill_keys == 3)][0]
    # non hill
    shannon = mu_data_other[np.where(non_hill_keys == 0)][0]
    gini = mu_data_other[np.where(non_hill_keys == 1)][0]
    raos = mu_data_other[np.where(non_hill_keys == 2)][0]
    # access non-weighted
    ac_1_nw = ac_data[np.where(ac_keys == 1)][0]
    ac_2_nw = ac_data[np.where(ac_keys == 2)][0]
    ac_5_nw = ac_data[np.where(ac_keys == 5)][0]
    # access weighted
    ac_1_w = ac_data_wt[np.where(ac_keys == 1)][0]
    ac_2_w = ac_data_wt[np.where(ac_keys == 2)][0]
    ac_5_w = ac_data_wt[np.where(ac_keys == 5)][0]
    # test manual metrics against all nodes
    mu_max_unique = len(landuse_classes)
    # test against various distances
    for d_idx in range(len(distances)):
        dist_cutoff = distances[d_idx]
        beta = betas[d_idx]
        for src_idx in range(len(primal_graph)):
            reachable_data, reachable_data_dist, tree_preds = data.aggregate_to_src_idx(src_idx,
                                                                                        node_data,
                                                                                        edge_data,
                                                                                        node_edge_map,
                                                                                        data_map,
                                                                                        dist_cutoff)
            # counts of each class type (array length per max unique classes - not just those within max distance)
            cl_counts = np.full(mu_max_unique, 0)
            # nearest of each class type (likewise)
            cl_nearest = np.full(mu_max_unique, np.inf)
            # aggregate
            a_1_nw = 0
            a_2_nw = 0
            a_5_nw = 0
            a_1_w = 0
            a_2_w = 0
            a_5_w = 0
            # iterate reachable
            for data_idx, (reachable, data_dist) in enumerate(zip(reachable_data, reachable_data_dist)):
                if not reachable:
                    continue
                cl = landuse_encodings[data_idx]
                # double check distance is within threshold
                assert data_dist <= dist_cutoff
                # update the class counts
                cl_counts[cl] += 1
                # if distance is nearer, update the nearest distance array too
                if data_dist < cl_nearest[cl]:
                    cl_nearest[cl] = data_dist
                # aggregate accessibility codes
                if cl == 1:
                    a_1_nw += 1
                    a_1_w += np.exp(-beta * data_dist)
                elif cl == 2:
                    a_2_nw += 1
                    a_2_w += np.exp(-beta * data_dist)
                elif cl == 5:
                    a_5_nw += 1
                    a_5_w += np.exp(-beta * data_dist)
            # assertions
            assert ac_1_nw[d_idx, src_idx] == a_1_nw
            assert ac_2_nw[d_idx, src_idx] == a_2_nw
            assert ac_5_nw[d_idx, src_idx] == a_5_nw

            assert ac_1_w[d_idx, src_idx] == a_1_w
            assert ac_2_w[d_idx, src_idx] == a_2_w
            assert ac_5_w[d_idx, src_idx] == a_5_w

            assert hill[0, d_idx, src_idx] == diversity.hill_diversity(cl_counts, 0)
            assert hill[1, d_idx, src_idx] == diversity.hill_diversity(cl_counts, 1)
            assert hill[2, d_idx, src_idx] == diversity.hill_diversity(cl_counts, 2)

            assert hill_branch_wt[0, d_idx, src_idx] == \
                   diversity.hill_diversity_branch_distance_wt(cl_counts, cl_nearest, 0, beta)
            assert hill_branch_wt[1, d_idx, src_idx] == \
                   diversity.hill_diversity_branch_distance_wt(cl_counts, cl_nearest, 1, beta)
            assert hill_branch_wt[2, d_idx, src_idx] == \
                   diversity.hill_diversity_branch_distance_wt(cl_counts, cl_nearest, 2, beta)

            assert hill_pw_wt[0, d_idx, src_idx] == \
                   diversity.hill_diversity_pairwise_distance_wt(cl_counts, cl_nearest, 0, beta)
            assert hill_pw_wt[1, d_idx, src_idx] == \
                   diversity.hill_diversity_pairwise_distance_wt(cl_counts, cl_nearest, 1, beta)
            assert hill_pw_wt[2, d_idx, src_idx] == \
                   diversity.hill_diversity_pairwise_distance_wt(cl_counts, cl_nearest, 2, beta)

            assert hill_disp_wt[0, d_idx, src_idx] == \
                   diversity.hill_diversity_pairwise_matrix_wt(cl_counts, mock_matrix, 0)
            assert hill_disp_wt[1, d_idx, src_idx] == \
                   diversity.hill_diversity_pairwise_matrix_wt(cl_counts, mock_matrix, 1)
            assert hill_disp_wt[2, d_idx, src_idx] == \
                   diversity.hill_diversity_pairwise_matrix_wt(cl_counts, mock_matrix, 2)

            assert shannon[d_idx, src_idx] == diversity.shannon_diversity(cl_counts)
            assert gini[d_idx, src_idx] == diversity.gini_simpson_diversity(cl_counts)
            assert raos[d_idx, src_idx] == diversity.raos_quadratic_diversity(cl_counts, mock_matrix)

    # check that angular is passed-through
    # actual angular tests happen in test_shortest_path_tree()
    # here the emphasis is simply on checking that the angular instruction gets chained through

    # setup dual data
    G_dual = graphs.nX_to_dual(primal_graph)
    node_labels_dual, node_data_dual, edge_data_dual, node_edge_map_dual = graphs.graph_maps_from_nX(G_dual)
    data_dict_dual = mock.mock_data_dict(G_dual, random_seed=13)
    data_uids_dual, data_map_dual = layers.data_map_from_dict(data_dict_dual)
    data_map_dual = data.assign_to_network(data_map_dual, node_data_dual, edge_data_dual, node_edge_map_dual, 500)
    mock_categorical = mock.mock_categorical_data(len(data_map_dual))
    landuse_classes_dual, landuse_encodings_dual = layers.encode_categorical(mock_categorical)
    mock_matrix = np.full((len(landuse_classes_dual), len(landuse_classes_dual)), 1)

    mu_hill_dual, mu_other_dual, ac_dual, ac_wt_dual = data.aggregate_landuses(node_data_dual,
                                                                               edge_data_dual,
                                                                               node_edge_map_dual,
                                                                               data_map_dual,
                                                                               distances,
                                                                               betas,
                                                                               landuse_encodings_dual,
                                                                               qs=qs,
                                                                               mixed_use_hill_keys=hill_keys,
                                                                               mixed_use_other_keys=non_hill_keys,
                                                                               accessibility_keys=ac_keys,
                                                                               cl_disparity_wt_matrix=mock_matrix,
                                                                               angular=True)

    mu_hill_dual_sidestep, mu_other_dual_sidestep, ac_dual_sidestep, ac_wt_dual_sidestep = \
        data.aggregate_landuses(node_data_dual,
                                edge_data_dual,
                                node_edge_map_dual,
                                data_map_dual,
                                distances,
                                betas,
                                landuse_encodings_dual,
                                qs=qs,
                                mixed_use_hill_keys=hill_keys,
                                mixed_use_other_keys=non_hill_keys,
                                accessibility_keys=ac_keys,
                                cl_disparity_wt_matrix=mock_matrix,
                                angular=False)

    assert not np.allclose(mu_hill_dual, mu_hill_dual_sidestep, atol=0.001, rtol=0)
    assert not np.allclose(mu_other_dual, mu_other_dual_sidestep, atol=0.001, rtol=0)
    assert not np.allclose(ac_dual, ac_dual_sidestep, atol=0.001, rtol=0)
    assert not np.allclose(ac_wt_dual, ac_wt_dual_sidestep, atol=0.001, rtol=0)
def test_compute_landuses(primal_graph):
    betas = np.array([0.01, 0.005])
    distances = networks.distance_from_beta(betas)
    # network layer
    N = networks.NetworkLayerFromNX(primal_graph, distances=distances)
    node_map = N._node_data
    edge_map = N._edge_data
    node_edge_map = N._node_edge_map
    # data layer
    data_dict = mock.mock_data_dict(primal_graph)
    qs = np.array([0, 1, 2])
    D = layers.DataLayerFromDict(data_dict)
    # check single metrics independently against underlying for some use-cases, e.g. hill, non-hill, accessibility...
    D.assign_to_network(N, max_dist=500)
    # generate some mock landuse data
    landuse_labels = mock.mock_categorical_data(len(data_dict))
    landuse_classes, landuse_encodings = layers.encode_categorical(
        landuse_labels)
    # compute hill mixed uses
    D.compute_landuses(landuse_labels,
                       mixed_use_keys=['hill_branch_wt'],
                       qs=qs)
    # test against underlying method
    data_map = D._data
    mu_data_hill, mu_data_other, ac_data, ac_data_wt = data.aggregate_landuses(
        node_map,
        edge_map,
        node_edge_map,
        data_map,
        distances,
        betas,
        landuse_encodings,
        qs=qs,
        mixed_use_hill_keys=np.array([1]))
    for q_idx, q_key in enumerate(qs):
        for d_idx, d_key in enumerate(distances):
            assert np.allclose(
                N.metrics['mixed_uses']['hill_branch_wt'][q_key][d_key],
                mu_data_hill[0][q_idx][d_idx],
                atol=0.001,
                rtol=0)
    # gini simpson
    D.compute_landuses(landuse_labels, mixed_use_keys=['gini_simpson'])
    # test against underlying method
    data_map = D._data
    mu_data_hill, mu_data_other, ac_data, ac_data_wt = data.aggregate_landuses(
        node_map,
        edge_map,
        node_edge_map,
        data_map,
        distances,
        betas,
        landuse_encodings,
        mixed_use_other_keys=np.array([1]))
    for d_idx, d_key in enumerate(distances):
        assert np.allclose(N.metrics['mixed_uses']['gini_simpson'][d_key],
                           mu_data_other[0][d_idx],
                           atol=0.001,
                           rtol=0)
    # accessibilities
    D.compute_landuses(landuse_labels, accessibility_keys=['c'])
    # test against underlying method
    data_map = D._data
    mu_data_hill, mu_data_other, ac_data, ac_data_wt = data.aggregate_landuses(
        node_map,
        edge_map,
        node_edge_map,
        data_map,
        distances,
        betas,
        landuse_encodings,
        accessibility_keys=np.array([landuse_classes.index('c')]))
    for d_idx, d_key in enumerate(distances):
        assert np.allclose(
            N.metrics['accessibility']['non_weighted']['c'][d_key],
            ac_data[0][d_idx],
            atol=0.001,
            rtol=0)
        assert np.allclose(N.metrics['accessibility']['weighted']['c'][d_key],
                           ac_data_wt[0][d_idx],
                           atol=0.001,
                           rtol=0)
    # also check the number of returned types for a few assortments of metrics
    mixed_uses_hill_types = np.array([
        'hill', 'hill_branch_wt', 'hill_pairwise_wt', 'hill_pairwise_disparity'
    ])
    mixed_use_other_types = np.array(
        ['shannon', 'gini_simpson', 'raos_pairwise_disparity'])
    ac_codes = np.array(landuse_classes)
    # mixed uses hill
    mu_hill_random = np.arange(len(mixed_uses_hill_types))
    np.random.shuffle(mu_hill_random)
    # mixed uses other
    mu_other_random = np.arange(len(mixed_use_other_types))
    np.random.shuffle(mu_other_random)
    # accessibility
    ac_random = np.arange(len(landuse_classes))
    np.random.shuffle(ac_random)
    # mock disparity matrix
    mock_disparity_wt_matrix = np.full(
        (len(landuse_classes), len(landuse_classes)), 1)
    # not necessary to do all labels, first few should do
    for mu_h_min in range(3):
        mu_h_keys = np.array(mu_hill_random[mu_h_min:])
        for mu_o_min in range(3):
            mu_o_keys = np.array(mu_other_random[mu_o_min:])
            for ac_min in range(3):
                ac_keys = np.array(ac_random[ac_min:])
                # in the final case, set accessibility to a single code otherwise an error would be raised
                if len(mu_h_keys) == 0 and len(mu_o_keys) == 0 and len(
                        ac_keys) == 0:
                    ac_keys = np.array([0])
                # randomise order of keys and metrics
                mu_h_metrics = mixed_uses_hill_types[mu_h_keys]
                mu_o_metrics = mixed_use_other_types[mu_o_keys]
                ac_metrics = ac_codes[ac_keys]
                # prepare network and compute
                N_temp = networks.NetworkLayerFromNX(primal_graph,
                                                     distances=distances)
                D_temp = layers.DataLayerFromDict(data_dict)
                D_temp.assign_to_network(N_temp, max_dist=500)
                D_temp.compute_landuses(
                    landuse_labels,
                    mixed_use_keys=list(mu_h_metrics) + list(mu_o_metrics),
                    accessibility_keys=ac_metrics,
                    cl_disparity_wt_matrix=mock_disparity_wt_matrix,
                    qs=qs)
                # test against underlying method
                mu_data_hill, mu_data_other, ac_data, ac_data_wt = \
                    data.aggregate_landuses(node_map,
                                            edge_map,
                                            node_edge_map,
                                            data_map,
                                            distances,
                                            betas,
                                            landuse_encodings,
                                            qs=qs,
                                            mixed_use_hill_keys=mu_h_keys,
                                            mixed_use_other_keys=mu_o_keys,
                                            accessibility_keys=ac_keys,
                                            cl_disparity_wt_matrix=mock_disparity_wt_matrix)
                for mu_h_idx, mu_h_met in enumerate(mu_h_metrics):
                    for q_idx, q_key in enumerate(qs):
                        for d_idx, d_key in enumerate(distances):
                            assert np.allclose(
                                N_temp.metrics['mixed_uses'][mu_h_met][q_key]
                                [d_key],
                                mu_data_hill[mu_h_idx][q_idx][d_idx],
                                atol=0.001,
                                rtol=0)
                for mu_o_idx, mu_o_met in enumerate(mu_o_metrics):
                    for d_idx, d_key in enumerate(distances):
                        assert np.allclose(
                            N_temp.metrics['mixed_uses'][mu_o_met][d_key],
                            mu_data_other[mu_o_idx][d_idx],
                            atol=0.001,
                            rtol=0)
                for ac_idx, ac_met in enumerate(ac_metrics):
                    for d_idx, d_key in enumerate(distances):
                        assert np.allclose(N_temp.metrics['accessibility']
                                           ['non_weighted'][ac_met][d_key],
                                           ac_data[ac_idx][d_idx],
                                           atol=0.001,
                                           rtol=0)
                        assert np.allclose(N_temp.metrics['accessibility']
                                           ['weighted'][ac_met][d_key],
                                           ac_data_wt[ac_idx][d_idx],
                                           atol=0.001,
                                           rtol=0)
    # most integrity checks happen in underlying method, though check here for mismatching labels length and typos
    with pytest.raises(ValueError):
        D.compute_landuses(landuse_labels[-1], mixed_use_keys=['shannon'])
    with pytest.raises(ValueError):
        D.compute_landuses(landuse_labels, mixed_use_keys=['spelling_typo'])
    # don't check accessibility_labels for typos - because only warning is triggered (not all labels will be in all data)
    # check that unassigned data layer flags
    with pytest.raises(ValueError):
        D_new = layers.DataLayerFromDict(data_dict)
        D_new.compute_landuses(landuse_labels, mixed_use_keys=['shannon'])