Python add_decomp_prefix示例

编程语言: Python

命名空间/包名称: tedana.io

方法/功能: add_decomp_prefix

hotexamples.com的示例: 6

Python add_decomp_prefix - 已找到6个示例。这些是从开源项目中提取的最受好评的tedana.io.add_decomp_prefix现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： pca.py 项目： schudds/tedana

def tedpca(data_cat,
           data_oc,
           combmode,
           mask,
           adaptive_mask,
           t2sG,
           ref_img,
           tes,
           algorithm='mdl',
           kdaw=10.,
           rdaw=1.,
           out_dir='.',
           verbose=False,
           low_mem=False):
    """
    Use principal components analysis (PCA) to identify and remove thermal
    noise from multi-echo data.

    Parameters
    ----------
    data_cat : (S x E x T) array_like
        Input functional data
    data_oc : (S x T) array_like
        Optimally combined time series data
    combmode : {'t2s', 'paid'} str
        How optimal combination of echos should be made, where 't2s' indicates
        using the method of Posse 1999 and 'paid' indicates using the method of
        Poser 2006
    mask : (S,) array_like
        Boolean mask array
    adaptive_mask : (S,) array_like
        Array where each value indicates the number of echoes with good signal
        for that voxel. This mask may be thresholded; for example, with values
        less than 3 set to 0.
        For more information on thresholding, see `make_adaptive_mask`.
    t2sG : (S,) array_like
        Map of voxel-wise T2* estimates.
    ref_img : :obj:`str` or img_like
        Reference image to dictate how outputs are saved to disk
    tes : :obj:`list`
        List of echo times associated with `data_cat`, in milliseconds
    algorithm : {'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic', float}, optional
        Method with which to select components in TEDPCA. PCA
        decomposition with the mdl, kic and aic options are based on a Moving Average
        (stationary Gaussian) process and are ordered from most to least aggressive
        (see Li et al., 2007).
        If a float is provided, then it is assumed to represent percentage of variance
        explained (0-1) to retain from PCA.
        Default is 'mdl'.
    kdaw : :obj:`float`, optional
        Dimensionality augmentation weight for Kappa calculations. Must be a
        non-negative float, or -1 (a special value). Default is 10.
    rdaw : :obj:`float`, optional
        Dimensionality augmentation weight for Rho calculations. Must be a
        non-negative float, or -1 (a special value). Default is 1.
    out_dir : :obj:`str`, optional
        Output directory.
    verbose : :obj:`bool`, optional
        Whether to output files from fitmodels_direct or not. Default: False
    low_mem : :obj:`bool`, optional
        Whether to use incremental PCA (for low-memory systems) or not.
        This is only compatible with the "kundu" or "kundu-stabilize" algorithms.
        Default: False

    Returns
    -------
    kept_data : (S x T) :obj:`numpy.ndarray`
        Dimensionally reduced optimally combined functional data
    n_components : :obj:`int`
        Number of components retained from PCA decomposition

    Notes
    -----
    ======================    =================================================
    Notation                  Meaning
    ======================    =================================================
    :math:`\\kappa`            Component pseudo-F statistic for TE-dependent
                              (BOLD) model.
    :math:`\\rho`              Component pseudo-F statistic for TE-independent
                              (artifact) model.
    :math:`v`                 Voxel
    :math:`V`                 Total number of voxels in mask
    :math:`\\zeta`             Something
    :math:`c`                 Component
    :math:`p`                 Something else
    ======================    =================================================

    Steps:

    1.  Variance normalize either multi-echo or optimally combined data,
        depending on settings.
    2.  Decompose normalized data using PCA or SVD.
    3.  Compute :math:`{\\kappa}` and :math:`{\\rho}`:

            .. math::
                {\\kappa}_c = \\frac{\\sum_{v}^V {\\zeta}_{c,v}^p * \
                      F_{c,v,R_2^*}}{\\sum {\\zeta}_{c,v}^p}

                {\\rho}_c = \\frac{\\sum_{v}^V {\\zeta}_{c,v}^p * \
                      F_{c,v,S_0}}{\\sum {\\zeta}_{c,v}^p}

    4.  Some other stuff. Something about elbows.
    5.  Classify components as thermal noise if they meet both of the
        following criteria:

            - Nonsignificant :math:`{\\kappa}` and :math:`{\\rho}`.
            - Nonsignificant variance explained.

    Outputs:

    This function writes out several files:

    ======================    =================================================
    Filename                  Content
    ======================    =================================================
    pca_decomposition.json    PCA component table.
    pca_mixing.tsv            PCA mixing matrix.
    pca_components.nii.gz     Component weight maps.
    ======================    =================================================

    See Also
    --------
    :func:`tedana.utils.make_adaptive_mask` : The function used to create the ``adaptive_mask``
                                              parameter.
    """
    if algorithm == 'kundu':
        alg_str = ("followed by the Kundu component selection decision "
                   "tree (Kundu et al., 2013)")
        RefLGR.info("Kundu, P., Brenowitz, N. D., Voon, V., Worbe, Y., "
                    "Vértes, P. E., Inati, S. J., ... & Bullmore, E. T. "
                    "(2013). Integrated strategy for improving functional "
                    "connectivity mapping using multiecho fMRI. Proceedings "
                    "of the National Academy of Sciences, 110(40), "
                    "16187-16192.")
    elif algorithm == 'kundu-stabilize':
        alg_str = ("followed by the 'stabilized' Kundu component "
                   "selection decision tree (Kundu et al., 2013)")
        RefLGR.info("Kundu, P., Brenowitz, N. D., Voon, V., Worbe, Y., "
                    "Vértes, P. E., Inati, S. J., ... & Bullmore, E. T. "
                    "(2013). Integrated strategy for improving functional "
                    "connectivity mapping using multiecho fMRI. Proceedings "
                    "of the National Academy of Sciences, 110(40), "
                    "16187-16192.")
    elif isinstance(algorithm, Number):
        alg_str = (
            "in which the number of components was determined based on a "
            "variance explained threshold")
    else:
        alg_str = (
            "based on the PCA component estimation with a Moving Average"
            "(stationary Gaussian) process (Li et al., 2007)")
        RefLGR.info("Li, Y.O., Adalı, T. and Calhoun, V.D., (2007). "
                    "Estimating the number of independent components for "
                    "functional magnetic resonance imaging data. "
                    "Human brain mapping, 28(11), pp.1251-1266.")

    RepLGR.info("Principal component analysis {0} was applied to "
                "the optimally combined data for dimensionality "
                "reduction.".format(alg_str))

    n_samp, n_echos, n_vols = data_cat.shape

    LGR.info('Computing PCA of optimally combined multi-echo data')
    data = data_oc[mask, :]

    data_z = ((data.T - data.T.mean(axis=0)) /
              data.T.std(axis=0)).T  # var normalize ts
    data_z = (data_z -
              data_z.mean()) / data_z.std()  # var normalize everything

    if algorithm in ['mdl', 'aic', 'kic']:
        data_img = io.new_nii_like(ref_img, utils.unmask(data, mask))
        mask_img = io.new_nii_like(ref_img, mask.astype(int))
        voxel_comp_weights, varex, varex_norm, comp_ts = ma_pca.ma_pca(
            data_img, mask_img, algorithm)
    elif isinstance(algorithm, Number):
        ppca = PCA(copy=False, n_components=algorithm, svd_solver="full")
        ppca.fit(data_z)
        comp_ts = ppca.components_.T
        varex = ppca.explained_variance_
        voxel_comp_weights = np.dot(np.dot(data_z, comp_ts),
                                    np.diag(1. / varex))
        varex_norm = varex / varex.sum()
    elif low_mem:
        voxel_comp_weights, varex, comp_ts = low_mem_pca(data_z)
        varex_norm = varex / varex.sum()
    else:
        ppca = PCA(copy=False, n_components=(n_vols - 1))
        ppca.fit(data_z)
        comp_ts = ppca.components_.T
        varex = ppca.explained_variance_
        voxel_comp_weights = np.dot(np.dot(data_z, comp_ts),
                                    np.diag(1. / varex))
        varex_norm = varex / varex.sum()

    # Compute Kappa and Rho for PCA comps
    # Normalize each component's time series
    vTmixN = stats.zscore(comp_ts, axis=0)
    comptable, _, _, _ = metrics.dependence_metrics(data_cat,
                                                    data_oc,
                                                    comp_ts,
                                                    adaptive_mask,
                                                    tes,
                                                    ref_img,
                                                    reindex=False,
                                                    mmixN=vTmixN,
                                                    algorithm=None,
                                                    label='mepca_',
                                                    out_dir=out_dir,
                                                    verbose=verbose)

    # varex_norm from PCA overrides varex_norm from dependence_metrics,
    # but we retain the original
    comptable['estimated normalized variance explained'] = \
        comptable['normalized variance explained']
    comptable['normalized variance explained'] = varex_norm

    # write component maps to 4D image
    comp_ts_z = stats.zscore(comp_ts, axis=0)
    comp_maps = utils.unmask(computefeats2(data_oc, comp_ts_z, mask), mask)
    io.filewrite(comp_maps, op.join(out_dir, 'pca_components.nii.gz'), ref_img)

    # Select components using decision tree
    if algorithm == 'kundu':
        comptable = kundu_tedpca(comptable,
                                 n_echos,
                                 kdaw,
                                 rdaw,
                                 stabilize=False)
    elif algorithm == 'kundu-stabilize':
        comptable = kundu_tedpca(comptable,
                                 n_echos,
                                 kdaw,
                                 rdaw,
                                 stabilize=True)
    else:
        alg_str = "variance explained-based" if isinstance(
            algorithm, Number) else algorithm
        LGR.info('Selected {0} components with {1} dimensionality '
                 'detection'.format(comptable.shape[0], alg_str))
        comptable['classification'] = 'accepted'
        comptable['rationale'] = ''

    # Save decomposition
    comp_names = [
        io.add_decomp_prefix(comp,
                             prefix='pca',
                             max_value=comptable.index.max())
        for comp in comptable.index.values
    ]

    mixing_df = pd.DataFrame(data=comp_ts, columns=comp_names)
    mixing_df.to_csv(op.join(out_dir, 'pca_mixing.tsv'), sep='\t', index=False)

    comptable['Description'] = 'PCA fit to optimally combined data.'
    mmix_dict = {}
    mmix_dict['Method'] = ('Principal components analysis implemented by '
                           'sklearn. Components are sorted by variance '
                           'explained in descending order. '
                           'Component signs are flipped to best match the '
                           'data.')
    io.save_comptable(comptable,
                      op.join(out_dir, 'pca_decomposition.json'),
                      label='pca',
                      metadata=mmix_dict)

    acc = comptable[comptable.classification == 'accepted'].index.values
    n_components = acc.size
    voxel_kept_comp_weighted = (voxel_comp_weights[:, acc] * varex[None, acc])
    kept_data = np.dot(voxel_kept_comp_weighted, comp_ts[:, acc].T)

    kept_data = stats.zscore(kept_data,
                             axis=1)  # variance normalize time series
    kept_data = stats.zscore(kept_data,
                             axis=None)  # variance normalize everything

    return kept_data, n_components

示例#2

显示文件

文件： collect.py 项目： handwerkerd/tedana

def generate_metrics(
    data_cat,
    data_optcom,
    mixing,
    adaptive_mask,
    tes,
    io_generator,
    label,
    metrics=None,
):
    """Fit TE-dependence and -independence models to components.

    Parameters
    ----------
    data_cat : (S x E x T) array_like
        Input data, where `S` is samples, `E` is echos, and `T` is time
    data_optcom : (S x T) array_like
        Optimally combined data
    mixing : (T x C) array_like
        Mixing matrix for converting input data to component space, where `C`
        is components and `T` is the same as in `data_cat`
    adaptive_mask : (S) array_like
        Array where each value indicates the number of echoes with good signal
        for that voxel. This mask may be thresholded; for example, with values
        less than 3 set to 0.
        For more information on thresholding, see `make_adaptive_mask`.
    tes : list
        List of echo times associated with `data_cat`, in milliseconds
    io_generator : tedana.io.OutputGenerator
        The output generator object for this workflow
    label : str in ['ICA', 'PCA']
        The label for this metric generation type
    metrics : list
        List of metrics to return

    Returns
    -------
    comptable : (C x X) :obj:`pandas.DataFrame`
        Component metric table. One row for each component, with a column for
        each metric. The index is the component number.
    """
    # Load metric dependency tree from json file
    dependency_config = op.join(utils.get_resource_path(), "config", "metrics.json")
    dependency_config = io.load_json(dependency_config)

    if metrics is None:
        metrics = ["map weight"]
    RepLGR.info("The following metrics were calculated: {}.".format(", ".join(metrics)))

    if not (data_cat.shape[0] == data_optcom.shape[0] == adaptive_mask.shape[0]):
        raise ValueError(
            "First dimensions (number of samples) of data_cat ({0}), "
            "data_optcom ({1}), and adaptive_mask ({2}) do not "
            "match".format(data_cat.shape[0], data_optcom.shape[0], adaptive_mask.shape[0])
        )
    elif data_cat.shape[1] != len(tes):
        raise ValueError(
            "Second dimension of data_cat ({0}) does not match "
            "number of echoes provided (tes; "
            "{1})".format(data_cat.shape[1], len(tes))
        )
    elif not (data_cat.shape[2] == data_optcom.shape[1] == mixing.shape[0]):
        raise ValueError(
            "Number of volumes in data_cat ({0}), "
            "data_optcom ({1}), and mixing ({2}) do not "
            "match.".format(data_cat.shape[2], data_optcom.shape[1], mixing.shape[0])
        )

    # Derive mask from thresholded adaptive mask
    mask = adaptive_mask >= 3

    # Apply masks before anything else
    data_cat = data_cat[mask, ...]
    data_optcom = data_optcom[mask, :]
    adaptive_mask = adaptive_mask[mask]

    # Ensure that echo times are in an array, rather than a list
    tes = np.asarray(tes)

    # Get reference image from io_generator
    ref_img = io_generator.reference_img

    required_metrics = dependency_resolver(
        dependency_config["dependencies"],
        metrics,
        dependency_config["inputs"],
    )

    # Use copy to avoid changing the original variable outside of this function
    mixing = mixing.copy()

    # Generate the component table, which will be filled out, column by column,
    # throughout this function
    n_components = mixing.shape[1]
    comptable = pd.DataFrame(index=np.arange(n_components, dtype=int))
    comptable["Component"] = [
        io.add_decomp_prefix(comp, prefix=label, max_value=comptable.shape[0])
        for comp in comptable.index.values
    ]

    # Metric maps
    # Maps will be stored as arrays in an easily-indexable dictionary
    metric_maps = {}
    if "map weight" in required_metrics:
        LGR.info("Calculating weight maps")
        metric_maps["map weight"] = dependence.calculate_weights(data_optcom, mixing)
        signs = determine_signs(metric_maps["map weight"], axis=0)
        comptable["optimal sign"] = signs
        metric_maps["map weight"], mixing = flip_components(
            metric_maps["map weight"], mixing, signs=signs
        )

    if "map optcom betas" in required_metrics:
        LGR.info("Calculating parameter estimate maps for optimally combined data")
        metric_maps["map optcom betas"] = dependence.calculate_betas(data_optcom, mixing)
        if io_generator.verbose:
            metric_maps["map echo betas"] = dependence.calculate_betas(data_cat, mixing)

    if "map percent signal change" in required_metrics:
        LGR.info("Calculating percent signal change maps")
        # used in kundu v3.2 tree
        metric_maps["map percent signal change"] = dependence.calculate_psc(
            data_optcom, metric_maps["map optcom betas"]
        )

    if "map Z" in required_metrics:
        LGR.info("Calculating z-statistic maps")
        metric_maps["map Z"] = dependence.calculate_z_maps(metric_maps["map weight"])

        if io_generator.verbose:
            io_generator.save_file(
                utils.unmask(metric_maps["map Z"] ** 2, mask),
                label + " component weights img",
            )

    if ("map FT2" in required_metrics) or ("map FS0" in required_metrics):
        LGR.info("Calculating F-statistic maps")
        m_T2, m_S0, p_m_T2, p_m_S0 = dependence.calculate_f_maps(
            data_cat, metric_maps["map Z"], mixing, adaptive_mask, tes
        )
        metric_maps["map FT2"] = m_T2
        metric_maps["map FS0"] = m_S0
        metric_maps["map predicted T2"] = p_m_T2
        metric_maps["map predicted S0"] = p_m_S0

    if "map Z clusterized" in required_metrics:
        LGR.info("Thresholding z-statistic maps")
        z_thresh = 1.95
        metric_maps["map Z clusterized"] = dependence.threshold_map(
            metric_maps["map Z"], mask, ref_img, z_thresh
        )

    if "map FT2 clusterized" in required_metrics:
        LGR.info("Calculating T2* F-statistic maps")
        f_thresh, _, _ = getfbounds(len(tes))
        metric_maps["map FT2 clusterized"] = dependence.threshold_map(
            metric_maps["map FT2"], mask, ref_img, f_thresh
        )

    if "map FS0 clusterized" in required_metrics:
        LGR.info("Calculating S0 F-statistic maps")
        f_thresh, _, _ = getfbounds(len(tes))
        metric_maps["map FS0 clusterized"] = dependence.threshold_map(
            metric_maps["map FS0"], mask, ref_img, f_thresh
        )

    # Intermediate metrics
    if "countsigFT2" in required_metrics:
        LGR.info("Counting significant voxels in T2* F-statistic maps")
        comptable["countsigFT2"] = dependence.compute_countsignal(
            metric_maps["map FT2 clusterized"]
        )

    if "countsigFS0" in required_metrics:
        LGR.info("Counting significant voxels in S0 F-statistic maps")
        comptable["countsigFS0"] = dependence.compute_countsignal(
            metric_maps["map FS0 clusterized"]
        )

    # Back to maps
    if "map beta T2 clusterized" in required_metrics:
        LGR.info("Thresholding optimal combination beta maps to match T2* F-statistic maps")
        metric_maps["map beta T2 clusterized"] = dependence.threshold_to_match(
            metric_maps["map optcom betas"], comptable["countsigFT2"], mask, ref_img
        )

    if "map beta S0 clusterized" in required_metrics:
        LGR.info("Thresholding optimal combination beta maps to match S0 F-statistic maps")
        metric_maps["map beta S0 clusterized"] = dependence.threshold_to_match(
            metric_maps["map optcom betas"], comptable["countsigFS0"], mask, ref_img
        )

    # Dependence metrics
    if ("kappa" in required_metrics) or ("rho" in required_metrics):
        LGR.info("Calculating kappa and rho")
        comptable["kappa"], comptable["rho"] = dependence.calculate_dependence_metrics(
            F_T2_maps=metric_maps["map FT2"],
            F_S0_maps=metric_maps["map FS0"],
            Z_maps=metric_maps["map Z"],
        )

    # Generic metrics
    if "variance explained" in required_metrics:
        LGR.info("Calculating variance explained")
        comptable["variance explained"] = dependence.calculate_varex(
            metric_maps["map optcom betas"]
        )

    if "normalized variance explained" in required_metrics:
        LGR.info("Calculating normalized variance explained")
        comptable["normalized variance explained"] = dependence.calculate_varex_norm(
            metric_maps["map weight"]
        )

    # Spatial metrics
    if "dice_FT2" in required_metrics:
        LGR.info(
            "Calculating DSI between thresholded T2* F-statistic and "
            "optimal combination beta maps"
        )
        comptable["dice_FT2"] = dependence.compute_dice(
            metric_maps["map beta T2 clusterized"],
            metric_maps["map FT2 clusterized"],
            axis=0,
        )

    if "dice_FS0" in required_metrics:
        LGR.info(
            "Calculating DSI between thresholded S0 F-statistic and "
            "optimal combination beta maps"
        )
        comptable["dice_FS0"] = dependence.compute_dice(
            metric_maps["map beta S0 clusterized"],
            metric_maps["map FS0 clusterized"],
            axis=0,
        )

    if "signal-noise_t" in required_metrics:
        LGR.info("Calculating signal-noise t-statistics")
        RepLGR.info(
            "A t-test was performed between the distributions of T2*-model "
            "F-statistics associated with clusters (i.e., signal) and "
            "non-cluster voxels (i.e., noise) to generate a t-statistic "
            "(metric signal-noise_z) and p-value (metric signal-noise_p) "
            "measuring relative association of the component to signal "
            "over noise."
        )
        (
            comptable["signal-noise_t"],
            comptable["signal-noise_p"],
        ) = dependence.compute_signal_minus_noise_t(
            Z_maps=metric_maps["map Z"],
            Z_clmaps=metric_maps["map Z clusterized"],
            F_T2_maps=metric_maps["map FT2"],
        )

    if "signal-noise_z" in required_metrics:
        LGR.info("Calculating signal-noise z-statistics")
        RepLGR.info(
            "A t-test was performed between the distributions of T2*-model "
            "F-statistics associated with clusters (i.e., signal) and "
            "non-cluster voxels (i.e., noise) to generate a z-statistic "
            "(metric signal-noise_z) and p-value (metric signal-noise_p) "
            "measuring relative association of the component to signal "
            "over noise."
        )
        (
            comptable["signal-noise_z"],
            comptable["signal-noise_p"],
        ) = dependence.compute_signal_minus_noise_z(
            Z_maps=metric_maps["map Z"],
            Z_clmaps=metric_maps["map Z clusterized"],
            F_T2_maps=metric_maps["map FT2"],
        )

    if "countnoise" in required_metrics:
        LGR.info("Counting significant noise voxels from z-statistic maps")
        RepLGR.info(
            "The number of significant voxels not from clusters was "
            "calculated for each component."
        )
        comptable["countnoise"] = dependence.compute_countnoise(
            metric_maps["map Z"], metric_maps["map Z clusterized"]
        )

    # Composite metrics
    if "d_table_score" in required_metrics:
        LGR.info("Calculating decision table score")
        comptable["d_table_score"] = dependence.generate_decision_table_score(
            comptable["kappa"],
            comptable["dice_FT2"],
            comptable["signal-noise_t"],
            comptable["countnoise"],
            comptable["countsigFT2"],
        )

    # Write verbose metrics if needed
    if io_generator.verbose:
        write_betas = "map echo betas" in metric_maps
        write_T2S0 = "map predicted T2" in metric_maps
        if write_betas:
            betas = metric_maps["map echo betas"]
        if write_T2S0:
            pred_T2_maps = metric_maps["map predicted T2"]
            pred_S0_maps = metric_maps["map predicted S0"]

        for i_echo in range(len(tes)):
            if write_betas:
                echo_betas = betas[:, i_echo, :]
                io_generator.save_file(
                    utils.unmask(echo_betas, mask),
                    "echo weight " + label + " map split img",
                    echo=(i_echo + 1),
                )

            if write_T2S0:
                echo_pred_T2_maps = pred_T2_maps[:, i_echo, :]
                io_generator.save_file(
                    utils.unmask(echo_pred_T2_maps, mask),
                    "echo T2 " + label + " split img",
                    echo=(i_echo + 1),
                )

                echo_pred_S0_maps = pred_S0_maps[:, i_echo, :]
                io_generator.save_file(
                    utils.unmask(echo_pred_S0_maps, mask),
                    "echo S0 " + label + " split img",
                    echo=(i_echo + 1),
                )

    # Reorder component table columns based on previous tedana versions
    # NOTE: Some new columns will be calculated and columns may be reordered during
    # component selection
    preferred_order = (
        "Component",
        "kappa",
        "rho",
        "variance explained",
        "normalized variance explained",
        "estimated normalized variance explained",
        "countsigFT2",
        "countsigFS0",
        "dice_FT2",
        "dice_FS0",
        "countnoise",
        "signal-noise_t",
        "signal-noise_p",
        "d_table_score",
        "kappa ratio",
        "d_table_score_scrub",
        "classification",
        "rationale",
    )
    first_columns = [col for col in preferred_order if col in comptable.columns]
    other_columns = [col for col in comptable.columns if col not in preferred_order]
    comptable = comptable[first_columns + other_columns]

    return comptable

示例#3

显示文件

def tedana_workflow(data,
                    tes,
                    out_dir='.',
                    mask=None,
                    fittype='loglin',
                    combmode='t2s',
                    tedpca='mdl',
                    fixed_seed=42,
                    maxit=500,
                    maxrestart=10,
                    tedort=False,
                    gscontrol=None,
                    no_png=False,
                    png_cmap='coolwarm',
                    verbose=False,
                    low_mem=False,
                    debug=False,
                    quiet=False,
                    t2smap=None,
                    mixm=None,
                    ctab=None,
                    manacc=None):
    """
    Run the "canonical" TE-Dependent ANAlysis workflow.

    Parameters
    ----------
    data : :obj:`str` or :obj:`list` of :obj:`str`
        Either a single z-concatenated file (single-entry list or str) or a
        list of echo-specific files, in ascending order.
    tes : :obj:`list`
        List of echo times associated with data in milliseconds.
    out_dir : :obj:`str`, optional
        Output directory.
    mask : :obj:`str` or None, optional
        Binary mask of voxels to include in TE Dependent ANAlysis. Must be
        spatially aligned with `data`. If an explicit mask is not provided,
        then Nilearn's compute_epi_mask function will be used to derive a mask
        from the first echo's data.
    fittype : {'loglin', 'curvefit'}, optional
        Monoexponential fitting method. 'loglin' uses the the default linear
        fit to the log of the data. 'curvefit' uses a monoexponential fit to
        the raw data, which is slightly slower but may be more accurate.
        Default is 'loglin'.
    combmode : {'t2s'}, optional
        Combination scheme for TEs: 't2s' (Posse 1999, default).
    tedpca : {'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic'}, optional
        Method with which to select components in TEDPCA. Default is 'mdl'.
    tedort : :obj:`bool`, optional
        Orthogonalize rejected components w.r.t. accepted ones prior to
        denoising. Default is False.
    gscontrol : {None, 't1c', 'gsr'} or :obj:`list`, optional
        Perform additional denoising to remove spatially diffuse noise. Default
        is None.
    verbose : :obj:`bool`, optional
        Generate intermediate and additional files. Default is False.
    no_png : obj:'bool', optional
        Do not generate .png plots and figures. Default is false.
    png_cmap : obj:'str', optional
        Name of a matplotlib colormap to be used when generating figures.
        Cannot be used with --no-png. Default is 'coolwarm'.
    t2smap : :obj:`str`, optional
        Precalculated T2* map in the same space as the input data.
    mixm : :obj:`str` or None, optional
        File containing mixing matrix, to be used when re-running the workflow.
        If not provided, ME-PCA and ME-ICA are done. Default is None.
    ctab : :obj:`str` or None, optional
        File containing component table from which to extract pre-computed
        classifications, to be used with 'mixm' when re-running the workflow.
        Default is None.
    manacc : :obj:`list`, :obj:`str`, or None, optional
        List of manually accepted components. Can be a list of the components,
        a comma-separated string with component numbers, or None. Default is
        None.

    Other Parameters
    ----------------
    fixed_seed : :obj:`int`, optional
        Value passed to ``mdp.numx_rand.seed()``.
        Set to a positive integer value for reproducible ICA results;
        otherwise, set to -1 for varying results across calls.
    maxit : :obj:`int`, optional
        Maximum number of iterations for ICA. Default is 500.
    maxrestart : :obj:`int`, optional
        Maximum number of attempts for ICA. If ICA fails to converge, the
        fixed seed will be updated and ICA will be run again. If convergence
        is achieved before maxrestart attempts, ICA will finish early.
        Default is 10.
    low_mem : :obj:`bool`, optional
        Enables low-memory processing, including the use of IncrementalPCA.
        May increase workflow duration. Default is False.
    debug : :obj:`bool`, optional
        Whether to run in debugging mode or not. Default is False.
    quiet : :obj:`bool`, optional
        If True, suppresses logging/printing of messages. Default is False.

    Notes
    -----
    This workflow writes out several files. For a complete list of the files
    generated by this workflow, please visit
    https://tedana.readthedocs.io/en/latest/outputs.html
    """
    out_dir = op.abspath(out_dir)
    if not op.isdir(out_dir):
        os.mkdir(out_dir)

    # boilerplate
    basename = 'report'
    extension = 'txt'
    repname = op.join(out_dir, (basename + '.' + extension))
    repex = op.join(out_dir, (basename + '*'))
    previousreps = glob(repex)
    previousreps.sort(reverse=True)
    for f in previousreps:
        previousparts = op.splitext(f)
        newname = previousparts[0] + '_old' + previousparts[1]
        os.rename(f, newname)
    refname = op.join(out_dir, '_references.txt')

    # create logfile name
    basename = 'tedana_'
    extension = 'tsv'
    start_time = datetime.datetime.now().strftime('%Y-%m-%dT%H%M%S')
    logname = op.join(out_dir, (basename + start_time + '.' + extension))

    # set logging format
    log_formatter = logging.Formatter(
        '%(asctime)s\t%(name)-12s\t%(levelname)-8s\t%(message)s',
        datefmt='%Y-%m-%dT%H:%M:%S')
    text_formatter = logging.Formatter('%(message)s')

    # set up logging file and open it for writing
    log_handler = logging.FileHandler(logname)
    log_handler.setFormatter(log_formatter)
    # Removing handlers after basicConfig doesn't work, so we use filters
    # for the relevant handlers themselves.
    log_handler.addFilter(ContextFilter())
    sh = logging.StreamHandler()
    sh.addFilter(ContextFilter())

    if quiet:
        logging.basicConfig(level=logging.WARNING, handlers=[log_handler, sh])
    elif debug:
        logging.basicConfig(level=logging.DEBUG, handlers=[log_handler, sh])
    else:
        logging.basicConfig(level=logging.INFO, handlers=[log_handler, sh])

    # Loggers for report and references
    rep_handler = logging.FileHandler(repname)
    rep_handler.setFormatter(text_formatter)
    ref_handler = logging.FileHandler(refname)
    ref_handler.setFormatter(text_formatter)
    RepLGR.setLevel(logging.INFO)
    RepLGR.addHandler(rep_handler)
    RepLGR.setLevel(logging.INFO)
    RefLGR.addHandler(ref_handler)

    LGR.info('Using output directory: {}'.format(out_dir))

    # ensure tes are in appropriate format
    tes = [float(te) for te in tes]
    n_echos = len(tes)

    # Coerce gscontrol to list
    if not isinstance(gscontrol, list):
        gscontrol = [gscontrol]

    LGR.info('Loading input data: {}'.format([f for f in data]))
    catd, ref_img = io.load_data(data, n_echos=n_echos)
    n_samp, n_echos, n_vols = catd.shape
    LGR.debug('Resulting data shape: {}'.format(catd.shape))

    if no_png and (png_cmap != 'coolwarm'):
        LGR.warning('Overriding --no-png since --png-cmap provided.')
        no_png = False

    # check if TR is 0
    img_t_r = ref_img.header.get_zooms()[-1]
    if img_t_r == 0 and not no_png:
        raise IOError(
            'Dataset has a TR of 0. This indicates incorrect'
            ' header information. To correct this, we recommend'
            ' using this snippet:'
            '\n'
            'https://gist.github.com/jbteves/032c87aeb080dd8de8861cb151bff5d6'
            '\n'
            'to correct your TR to the value it should be.')

    if mixm is not None and op.isfile(mixm):
        mixm = op.abspath(mixm)
        # Allow users to re-run on same folder
        if mixm != op.join(out_dir, 'ica_mixing.tsv'):
            shutil.copyfile(mixm, op.join(out_dir, 'ica_mixing.tsv'))
            shutil.copyfile(mixm, op.join(out_dir, op.basename(mixm)))
    elif mixm is not None:
        raise IOError('Argument "mixm" must be an existing file.')

    if ctab is not None and op.isfile(ctab):
        ctab = op.abspath(ctab)
        # Allow users to re-run on same folder
        if ctab != op.join(out_dir, 'ica_decomposition.json'):
            shutil.copyfile(ctab, op.join(out_dir, 'ica_decomposition.json'))
            shutil.copyfile(ctab, op.join(out_dir, op.basename(ctab)))
    elif ctab is not None:
        raise IOError('Argument "ctab" must be an existing file.')

    if isinstance(manacc, str):
        manacc = [int(comp) for comp in manacc.split(',')]

    if ctab and not mixm:
        LGR.warning('Argument "ctab" requires argument "mixm".')
        ctab = None
    elif manacc is not None and not mixm:
        LGR.warning('Argument "manacc" requires argument "mixm".')
        manacc = None

    if t2smap is not None and op.isfile(t2smap):
        t2smap = op.abspath(t2smap)
        # Allow users to re-run on same folder
        if t2smap != op.join(out_dir, 't2sv.nii.gz'):
            shutil.copyfile(t2smap, op.join(out_dir, 't2sv.nii.gz'))
            shutil.copyfile(t2smap, op.join(out_dir, op.basename(t2smap)))
    elif t2smap is not None:
        raise IOError('Argument "t2smap" must be an existing file.')

    RepLGR.info("TE-dependence analysis was performed on input data.")
    if mask and not t2smap:
        # TODO: add affine check
        LGR.info('Using user-defined mask')
        RepLGR.info("A user-defined mask was applied to the data.")
    elif t2smap and not mask:
        LGR.info('Using user-defined T2* map to generate mask')
        t2s_limited = utils.load_image(t2smap)
        t2s_full = t2s_limited.copy()
        mask = (t2s_limited != 0).astype(int)
    elif t2smap and mask:
        LGR.info('Combining user-defined mask and T2* map to generate mask')
        t2s_limited = utils.load_image(t2smap)
        t2s_full = t2s_limited.copy()
        mask = utils.load_image(mask)
        mask[t2s_limited == 0] = 0  # reduce mask based on T2* map
    else:
        LGR.info('Computing EPI mask from first echo')
        first_echo_img = io.new_nii_like(ref_img, catd[:, 0, :])
        mask = compute_epi_mask(first_echo_img)
        RepLGR.info("An initial mask was generated from the first echo using "
                    "nilearn's compute_epi_mask function.")

    mask, masksum = utils.make_adaptive_mask(catd, mask=mask, getsum=True)
    LGR.debug('Retaining {}/{} samples'.format(mask.sum(), n_samp))
    io.filewrite(masksum, op.join(out_dir, 'adaptive_mask.nii'), ref_img)

    if t2smap is None:
        LGR.info('Computing T2* map')
        t2s_limited, s0_limited, t2s_full, s0_full = decay.fit_decay(
            catd, tes, mask, masksum, fittype)

        # set a hard cap for the T2* map
        # anything that is 10x higher than the 99.5 %ile will be reset to 99.5 %ile
        cap_t2s = stats.scoreatpercentile(t2s_limited.flatten(),
                                          99.5,
                                          interpolation_method='lower')
        LGR.debug('Setting cap on T2* map at {:.5f}'.format(cap_t2s * 10))
        t2s_limited[t2s_limited > cap_t2s * 10] = cap_t2s
        io.filewrite(t2s_limited, op.join(out_dir, 't2sv.nii'), ref_img)
        io.filewrite(s0_limited, op.join(out_dir, 's0v.nii'), ref_img)

        if verbose:
            io.filewrite(t2s_full, op.join(out_dir, 't2svG.nii'), ref_img)
            io.filewrite(s0_full, op.join(out_dir, 's0vG.nii'), ref_img)

    # optimally combine data
    data_oc = combine.make_optcom(catd,
                                  tes,
                                  mask,
                                  t2s=t2s_full,
                                  combmode=combmode)

    # regress out global signal unless explicitly not desired
    if 'gsr' in gscontrol:
        catd, data_oc = gsc.gscontrol_raw(catd,
                                          data_oc,
                                          n_echos,
                                          ref_img,
                                          out_dir=out_dir)

    if mixm is None:
        # Identify and remove thermal noise from data
        dd, n_components = decomposition.tedpca(catd,
                                                data_oc,
                                                combmode,
                                                mask,
                                                t2s_limited,
                                                t2s_full,
                                                ref_img,
                                                tes=tes,
                                                algorithm=tedpca,
                                                kdaw=10.,
                                                rdaw=1.,
                                                out_dir=out_dir,
                                                verbose=verbose,
                                                low_mem=low_mem)
        mmix_orig = decomposition.tedica(dd, n_components, fixed_seed, maxit,
                                         maxrestart)

        if verbose:
            io.filewrite(utils.unmask(dd, mask),
                         op.join(out_dir, 'ts_OC_whitened.nii.gz'), ref_img)

        LGR.info('Making second component selection guess from ICA results')
        # Estimate betas and compute selection metrics for mixing matrix
        # generated from dimensionally reduced data using full data (i.e., data
        # with thermal noise)
        comptable, metric_maps, betas, mmix = metrics.dependence_metrics(
            catd,
            data_oc,
            mmix_orig,
            t2s_limited,
            tes,
            ref_img,
            reindex=True,
            label='meica_',
            out_dir=out_dir,
            algorithm='kundu_v2',
            verbose=verbose)
        comp_names = [
            io.add_decomp_prefix(comp,
                                 prefix='ica',
                                 max_value=comptable.index.max())
            for comp in comptable.index.values
        ]
        mixing_df = pd.DataFrame(data=mmix, columns=comp_names)
        mixing_df.to_csv(op.join(out_dir, 'ica_mixing.tsv'),
                         sep='\t',
                         index=False)
        betas_oc = utils.unmask(computefeats2(data_oc, mmix, mask), mask)
        io.filewrite(betas_oc, op.join(out_dir, 'ica_components.nii.gz'),
                     ref_img)

        comptable = metrics.kundu_metrics(comptable, metric_maps)
        comptable = selection.kundu_selection_v2(comptable, n_echos, n_vols)
    else:
        LGR.info('Using supplied mixing matrix from ICA')
        mmix_orig = pd.read_table(op.join(out_dir, 'ica_mixing.tsv')).values

        if ctab is None:
            comptable, metric_maps, betas, mmix = metrics.dependence_metrics(
                catd,
                data_oc,
                mmix_orig,
                t2s_limited,
                tes,
                ref_img,
                label='meica_',
                out_dir=out_dir,
                algorithm='kundu_v2',
                verbose=verbose)
            comptable = metrics.kundu_metrics(comptable, metric_maps)
            comptable = selection.kundu_selection_v2(comptable, n_echos,
                                                     n_vols)
        else:
            mmix = mmix_orig.copy()
            comptable = io.load_comptable(ctab)
            if manacc is not None:
                comptable = selection.manual_selection(comptable, acc=manacc)
        betas_oc = utils.unmask(computefeats2(data_oc, mmix, mask), mask)
        io.filewrite(betas_oc, op.join(out_dir, 'ica_components.nii.gz'),
                     ref_img)

    # Save decomposition
    comptable[
        'Description'] = 'ICA fit to dimensionally-reduced optimally combined data.'
    mmix_dict = {}
    mmix_dict['Method'] = ('Independent components analysis with FastICA '
                           'algorithm implemented by sklearn. Components '
                           'are sorted by Kappa in descending order. '
                           'Component signs are flipped to best match the '
                           'data.')
    io.save_comptable(comptable,
                      op.join(out_dir, 'ica_decomposition.json'),
                      label='ica',
                      metadata=mmix_dict)

    if comptable[comptable.classification == 'accepted'].shape[0] == 0:
        LGR.warning('No BOLD components detected! Please check data and '
                    'results!')

    mmix_orig = mmix.copy()
    if tedort:
        acc_idx = comptable.loc[~comptable.classification.str.
                                contains('rejected')].index.values
        rej_idx = comptable.loc[comptable.classification.str.contains(
            'rejected')].index.values
        acc_ts = mmix[:, acc_idx]
        rej_ts = mmix[:, rej_idx]
        betas = np.linalg.lstsq(acc_ts, rej_ts, rcond=None)[0]
        pred_rej_ts = np.dot(acc_ts, betas)
        resid = rej_ts - pred_rej_ts
        mmix[:, rej_idx] = resid
        comp_names = [
            io.add_decomp_prefix(comp,
                                 prefix='ica',
                                 max_value=comptable.index.max())
            for comp in comptable.index.values
        ]
        mixing_df = pd.DataFrame(data=mmix, columns=comp_names)
        mixing_df.to_csv(op.join(out_dir, 'ica_orth_mixing.tsv'),
                         sep='\t',
                         index=False)
        RepLGR.info("Rejected components' time series were then "
                    "orthogonalized with respect to accepted components' time "
                    "series.")

    io.writeresults(data_oc,
                    mask=mask,
                    comptable=comptable,
                    mmix=mmix,
                    n_vols=n_vols,
                    ref_img=ref_img,
                    out_dir=out_dir)

    if 't1c' in gscontrol:
        gsc.gscontrol_mmix(data_oc,
                           mmix,
                           mask,
                           comptable,
                           ref_img,
                           out_dir=out_dir)

    if verbose:
        io.writeresults_echoes(catd,
                               mmix,
                               mask,
                               comptable,
                               ref_img,
                               out_dir=out_dir)

    if not no_png:
        LGR.info('Making figures folder with static component maps and '
                 'timecourse plots.')
        # make figure folder first
        if not op.isdir(op.join(out_dir, 'figures')):
            os.mkdir(op.join(out_dir, 'figures'))

        viz.write_comp_figs(data_oc,
                            mask=mask,
                            comptable=comptable,
                            mmix=mmix_orig,
                            ref_img=ref_img,
                            out_dir=op.join(out_dir, 'figures'),
                            png_cmap=png_cmap)

        LGR.info('Making Kappa vs Rho scatter plot')
        viz.write_kappa_scatter(comptable=comptable,
                                out_dir=op.join(out_dir, 'figures'))

        LGR.info('Making Kappa/Rho scree plot')
        viz.write_kappa_scree(comptable=comptable,
                              out_dir=op.join(out_dir, 'figures'))

        LGR.info('Making overall summary figure')
        viz.write_summary_fig(comptable=comptable,
                              out_dir=op.join(out_dir, 'figures'))

    LGR.info('Workflow completed')

    RepLGR.info("This workflow used numpy (Van Der Walt, Colbert, & "
                "Varoquaux, 2011), scipy (Jones et al., 2001), pandas "
                "(McKinney, 2010), scikit-learn (Pedregosa et al., 2011), "
                "nilearn, and nibabel (Brett et al., 2019).")
    RefLGR.info(
        "Van Der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The "
        "NumPy array: a structure for efficient numerical computation. "
        "Computing in Science & Engineering, 13(2), 22.")
    RefLGR.info("Jones E, Oliphant E, Peterson P, et al. SciPy: Open Source "
                "Scientific Tools for Python, 2001-, http://www.scipy.org/")
    RefLGR.info("McKinney, W. (2010, June). Data structures for statistical "
                "computing in python. In Proceedings of the 9th Python in "
                "Science Conference (Vol. 445, pp. 51-56).")
    RefLGR.info("Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., "
                "Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). "
                "Scikit-learn: Machine learning in Python. Journal of machine "
                "learning research, 12(Oct), 2825-2830.")
    RefLGR.info("Brett, M., Markiewicz, C. J., Hanke, M., Côté, M.-A., "
                "Cipollini, B., McCarthy, P., … freec84. (2019, May 28). "
                "nipy/nibabel. Zenodo. http://doi.org/10.5281/zenodo.3233118")

    RepLGR.info("This workflow also used the Dice similarity index "
                "(Dice, 1945; Sørensen, 1948).")
    RefLGR.info("Dice, L. R. (1945). Measures of the amount of ecologic "
                "association between species. Ecology, 26(3), 297-302.")
    RefLGR.info(
        "Sørensen, T. J. (1948). A method of establishing groups of "
        "equal amplitude in plant sociology based on similarity of "
        "species content and its application to analyses of the "
        "vegetation on Danish commons. I kommission hos E. Munksgaard.")

    with open(repname, 'r') as fo:
        report = [line.rstrip() for line in fo.readlines()]
        report = ' '.join(report)
    with open(refname, 'r') as fo:
        reference_list = sorted(list(set(fo.readlines())))
        references = '\n'.join(reference_list)
    report += '\n\nReferences\n' + references
    with open(repname, 'w') as fo:
        fo.write(report)
    os.remove(refname)

    for handler in logging.root.handlers[:]:
        logging.root.removeHandler(handler)

示例#4

显示文件

文件： tedana.py 项目： jbteves/tedana

def tedana_workflow(data,
                    tes,
                    out_dir='.',
                    mask=None,
                    convention='bids',
                    prefix='',
                    fittype='loglin',
                    combmode='t2s',
                    tedpca='mdl',
                    fixed_seed=42,
                    maxit=500,
                    maxrestart=10,
                    tedort=False,
                    gscontrol=None,
                    no_reports=False,
                    png_cmap='coolwarm',
                    verbose=False,
                    low_mem=False,
                    debug=False,
                    quiet=False,
                    t2smap=None,
                    mixm=None,
                    ctab=None,
                    manacc=None):
    """
    Run the "canonical" TE-Dependent ANAlysis workflow.

    Parameters
    ----------
    data : :obj:`str` or :obj:`list` of :obj:`str`
        Either a single z-concatenated file (single-entry list or str) or a
        list of echo-specific files, in ascending order.
    tes : :obj:`list`
        List of echo times associated with data in milliseconds.
    out_dir : :obj:`str`, optional
        Output directory.
    mask : :obj:`str` or None, optional
        Binary mask of voxels to include in TE Dependent ANAlysis. Must be
        spatially aligned with `data`. If an explicit mask is not provided,
        then Nilearn's compute_epi_mask function will be used to derive a mask
        from the first echo's data.
    fittype : {'loglin', 'curvefit'}, optional
        Monoexponential fitting method. 'loglin' uses the the default linear
        fit to the log of the data. 'curvefit' uses a monoexponential fit to
        the raw data, which is slightly slower but may be more accurate.
        Default is 'loglin'.
    combmode : {'t2s'}, optional
        Combination scheme for TEs: 't2s' (Posse 1999, default).
    tedpca : {'mdl', 'aic', 'kic', 'kundu', 'kundu-stabilize', float}, optional
        Method with which to select components in TEDPCA.
        If a float is provided, then it is assumed to represent percentage of variance
        explained (0-1) to retain from PCA.
        Default is 'mdl'.
    tedort : :obj:`bool`, optional
        Orthogonalize rejected components w.r.t. accepted ones prior to
        denoising. Default is False.
    gscontrol : {None, 'mir', 'gsr'} or :obj:`list`, optional
        Perform additional denoising to remove spatially diffuse noise. Default
        is None.
    verbose : :obj:`bool`, optional
        Generate intermediate and additional files. Default is False.
    no_reports : obj:'bool', optional
        Do not generate .html reports and .png plots. Default is false such
        that reports are generated.
    png_cmap : obj:'str', optional
        Name of a matplotlib colormap to be used when generating figures.
        Cannot be used with --no-png. Default is 'coolwarm'.
    t2smap : :obj:`str`, optional
        Precalculated T2* map in the same space as the input data. Values in
        the map must be in seconds.
    mixm : :obj:`str` or None, optional
        File containing mixing matrix, to be used when re-running the workflow.
        If not provided, ME-PCA and ME-ICA are done. Default is None.
    ctab : :obj:`str` or None, optional
        File containing component table from which to extract pre-computed
        classifications, to be used with 'mixm' when re-running the workflow.
        Default is None.
    manacc : :obj:`list` of :obj:`int` or None, optional
        List of manually accepted components. Can be a list of the components
        numbers or None.
        If provided, this parameter requires ``mixm`` and ``ctab`` to be provided as well.
        Default is None.

    Other Parameters
    ----------------
    fixed_seed : :obj:`int`, optional
        Value passed to ``mdp.numx_rand.seed()``.
        Set to a positive integer value for reproducible ICA results;
        otherwise, set to -1 for varying results across calls.
    maxit : :obj:`int`, optional
        Maximum number of iterations for ICA. Default is 500.
    maxrestart : :obj:`int`, optional
        Maximum number of attempts for ICA. If ICA fails to converge, the
        fixed seed will be updated and ICA will be run again. If convergence
        is achieved before maxrestart attempts, ICA will finish early.
        Default is 10.
    low_mem : :obj:`bool`, optional
        Enables low-memory processing, including the use of IncrementalPCA.
        May increase workflow duration. Default is False.
    debug : :obj:`bool`, optional
        Whether to run in debugging mode or not. Default is False.
    quiet : :obj:`bool`, optional
        If True, suppresses logging/printing of messages. Default is False.

    Notes
    -----
    This workflow writes out several files. For a complete list of the files
    generated by this workflow, please visit
    https://tedana.readthedocs.io/en/latest/outputs.html
    """
    out_dir = op.abspath(out_dir)
    if not op.isdir(out_dir):
        os.mkdir(out_dir)

    # boilerplate
    basename = 'report'
    extension = 'txt'
    repname = op.join(out_dir, (basename + '.' + extension))
    repex = op.join(out_dir, (basename + '*'))
    previousreps = glob(repex)
    previousreps.sort(reverse=True)
    for f in previousreps:
        previousparts = op.splitext(f)
        newname = previousparts[0] + '_old' + previousparts[1]
        os.rename(f, newname)
    refname = op.join(out_dir, '_references.txt')

    # create logfile name
    basename = 'tedana_'
    extension = 'tsv'
    start_time = datetime.datetime.now().strftime('%Y-%m-%dT%H%M%S')
    logname = op.join(out_dir, (basename + start_time + '.' + extension))
    utils.setup_loggers(logname, repname, refname, quiet=quiet, debug=debug)

    LGR.info('Using output directory: {}'.format(out_dir))

    # ensure tes are in appropriate format
    tes = [float(te) for te in tes]
    n_echos = len(tes)

    # Coerce gscontrol to list
    if not isinstance(gscontrol, list):
        gscontrol = [gscontrol]

    # Check value of tedpca *if* it is a float
    tedpca = check_tedpca_value(tedpca, is_parser=False)

    LGR.info('Loading input data: {}'.format([f for f in data]))
    catd, ref_img = io.load_data(data, n_echos=n_echos)
    io_generator = io.OutputGenerator(
        ref_img,
        convention=convention,
        out_dir=out_dir,
        prefix=prefix,
        config="auto",
        verbose=verbose,
    )

    n_samp, n_echos, n_vols = catd.shape
    LGR.debug('Resulting data shape: {}'.format(catd.shape))

    # check if TR is 0
    img_t_r = io_generator.reference_img.header.get_zooms()[-1]
    if img_t_r == 0:
        raise IOError(
            'Dataset has a TR of 0. This indicates incorrect'
            ' header information. To correct this, we recommend'
            ' using this snippet:'
            '\n'
            'https://gist.github.com/jbteves/032c87aeb080dd8de8861cb151bff5d6'
            '\n'
            'to correct your TR to the value it should be.')

    if mixm is not None and op.isfile(mixm):
        mixm = op.abspath(mixm)
        # Allow users to re-run on same folder
        mixing_name = io_generator.get_name("ICA mixing tsv")
        if mixm != mixing_name:
            shutil.copyfile(mixm, mixing_name)
            shutil.copyfile(mixm,
                            op.join(io_generator.out_dir, op.basename(mixm)))
    elif mixm is not None:
        raise IOError('Argument "mixm" must be an existing file.')

    if ctab is not None and op.isfile(ctab):
        ctab = op.abspath(ctab)
        # Allow users to re-run on same folder
        metrics_name = io_generator.get_name("ICA metrics tsv")
        if ctab != metrics_name:
            shutil.copyfile(ctab, metrics_name)
            shutil.copyfile(ctab,
                            op.join(io_generator.out_dir, op.basename(ctab)))
    elif ctab is not None:
        raise IOError('Argument "ctab" must be an existing file.')

    if ctab and not mixm:
        LGR.warning('Argument "ctab" requires argument "mixm".')
        ctab = None
    elif manacc is not None and (not mixm or not ctab):
        LGR.warning('Argument "manacc" requires arguments "mixm" and "ctab".')
        manacc = None
    elif manacc is not None:
        # coerce to list of integers
        manacc = [int(m) for m in manacc]

    if t2smap is not None and op.isfile(t2smap):
        t2smap_file = io_generator.get_name('t2star img')
        t2smap = op.abspath(t2smap)
        # Allow users to re-run on same folder
        if t2smap != t2smap_file:
            shutil.copyfile(t2smap, t2smap_file)
    elif t2smap is not None:
        raise IOError('Argument "t2smap" must be an existing file.')

    RepLGR.info("TE-dependence analysis was performed on input data.")
    if mask and not t2smap:
        # TODO: add affine check
        LGR.info('Using user-defined mask')
        RepLGR.info("A user-defined mask was applied to the data.")
    elif t2smap and not mask:
        LGR.info('Using user-defined T2* map to generate mask')
        t2s_limited_sec = utils.load_image(t2smap)
        t2s_limited = utils.sec2millisec(t2s_limited_sec)
        t2s_full = t2s_limited.copy()
        mask = (t2s_limited != 0).astype(int)
    elif t2smap and mask:
        LGR.info('Combining user-defined mask and T2* map to generate mask')
        t2s_limited_sec = utils.load_image(t2smap)
        t2s_limited = utils.sec2millisec(t2s_limited_sec)
        t2s_full = t2s_limited.copy()
        mask = utils.load_image(mask)
        mask[t2s_limited == 0] = 0  # reduce mask based on T2* map
    else:
        LGR.info('Computing EPI mask from first echo')
        first_echo_img = io.new_nii_like(io_generator.reference_img,
                                         catd[:, 0, :])
        mask = compute_epi_mask(first_echo_img)
        RepLGR.info("An initial mask was generated from the first echo using "
                    "nilearn's compute_epi_mask function.")

    # Create an adaptive mask with at least 1 good echo, for denoising
    mask_denoise, masksum_denoise = utils.make_adaptive_mask(
        catd,
        mask=mask,
        getsum=True,
        threshold=1,
    )
    LGR.debug('Retaining {}/{} samples for denoising'.format(
        mask_denoise.sum(), n_samp))
    io_generator.save_file(masksum_denoise, "adaptive mask img")

    # Create an adaptive mask with at least 3 good echoes, for classification
    masksum_clf = masksum_denoise.copy()
    masksum_clf[masksum_clf < 3] = 0
    mask_clf = masksum_clf.astype(bool)
    RepLGR.info(
        "A two-stage masking procedure was applied, in which a liberal mask "
        "(including voxels with good data in at least the first echo) was used for "
        "optimal combination, T2*/S0 estimation, and denoising, while a more conservative mask "
        "(restricted to voxels with good data in at least the first three echoes) was used for "
        "the component classification procedure.")
    LGR.debug('Retaining {}/{} samples for classification'.format(
        mask_clf.sum(), n_samp))

    if t2smap is None:
        LGR.info('Computing T2* map')
        t2s_limited, s0_limited, t2s_full, s0_full = decay.fit_decay(
            catd, tes, mask_denoise, masksum_denoise, fittype)

        # set a hard cap for the T2* map
        # anything that is 10x higher than the 99.5 %ile will be reset to 99.5 %ile
        cap_t2s = stats.scoreatpercentile(t2s_full.flatten(),
                                          99.5,
                                          interpolation_method='lower')
        LGR.debug('Setting cap on T2* map at {:.5f}s'.format(
            utils.millisec2sec(cap_t2s)))
        t2s_full[t2s_full > cap_t2s * 10] = cap_t2s
        io_generator.save_file(utils.millisec2sec(t2s_full), 't2star img')
        io_generator.save_file(s0_full, 's0 img')

        if verbose:
            io_generator.save_file(utils.millisec2sec(t2s_limited),
                                   'limited t2star img')
            io_generator.save_file(s0_limited, 'limited s0 img')

    # optimally combine data
    data_oc = combine.make_optcom(catd,
                                  tes,
                                  masksum_denoise,
                                  t2s=t2s_full,
                                  combmode=combmode)

    # regress out global signal unless explicitly not desired
    if 'gsr' in gscontrol:
        catd, data_oc = gsc.gscontrol_raw(catd, data_oc, n_echos, io_generator)

    fout = io_generator.save_file(data_oc, 'combined img')
    LGR.info('Writing optimally combined data set: {}'.format(fout))

    if mixm is None:
        # Identify and remove thermal noise from data
        dd, n_components = decomposition.tedpca(catd,
                                                data_oc,
                                                combmode,
                                                mask_clf,
                                                masksum_clf,
                                                t2s_full,
                                                io_generator,
                                                tes=tes,
                                                algorithm=tedpca,
                                                kdaw=10.,
                                                rdaw=1.,
                                                verbose=verbose,
                                                low_mem=low_mem)
        if verbose:
            io_generator.save_file(utils.unmask(dd, mask_clf), 'whitened img')

        # Perform ICA, calculate metrics, and apply decision tree
        # Restart when ICA fails to converge or too few BOLD components found
        keep_restarting = True
        n_restarts = 0
        seed = fixed_seed
        while keep_restarting:
            mmix, seed = decomposition.tedica(dd,
                                              n_components,
                                              seed,
                                              maxit,
                                              maxrestart=(maxrestart -
                                                          n_restarts))
            seed += 1
            n_restarts = seed - fixed_seed

            # Estimate betas and compute selection metrics for mixing matrix
            # generated from dimensionally reduced data using full data (i.e., data
            # with thermal noise)
            LGR.info(
                'Making second component selection guess from ICA results')
            required_metrics = [
                'kappa', 'rho', 'countnoise', 'countsigFT2', 'countsigFS0',
                'dice_FT2', 'dice_FS0', 'signal-noise_t', 'variance explained',
                'normalized variance explained', 'd_table_score'
            ]
            comptable = metrics.collect.generate_metrics(
                catd,
                data_oc,
                mmix,
                masksum_clf,
                tes,
                io_generator,
                'ICA',
                metrics=required_metrics,
            )
            comptable, metric_metadata = selection.kundu_selection_v2(
                comptable, n_echos, n_vols)

            n_bold_comps = comptable[comptable.classification ==
                                     'accepted'].shape[0]
            if (n_restarts < maxrestart) and (n_bold_comps == 0):
                LGR.warning("No BOLD components found. Re-attempting ICA.")
            elif (n_bold_comps == 0):
                LGR.warning(
                    "No BOLD components found, but maximum number of restarts reached."
                )
                keep_restarting = False
            else:
                keep_restarting = False

            RepLGR.disabled = True  # Disable the report to avoid duplicate text
        RepLGR.disabled = False  # Re-enable the report after the while loop is escaped
    else:
        LGR.info('Using supplied mixing matrix from ICA')
        mixing_file = io_generator.get_name("ICA mixing tsv")
        mmix = pd.read_table(mixing_file).values

        if ctab is None:
            required_metrics = [
                'kappa', 'rho', 'countnoise', 'countsigFT2', 'countsigFS0',
                'dice_FT2', 'dice_FS0', 'signal-noise_t', 'variance explained',
                'normalized variance explained', 'd_table_score'
            ]
            comptable = metrics.collect.generate_metrics(
                catd,
                data_oc,
                mmix,
                masksum_clf,
                tes,
                io_generator,
                'ICA',
                metrics=required_metrics,
            )
            comptable, metric_metadata = selection.kundu_selection_v2(
                comptable, n_echos, n_vols)
        else:
            comptable = pd.read_table(ctab)

            if manacc is not None:
                comptable, metric_metadata = selection.manual_selection(
                    comptable, acc=manacc)

    # Write out ICA files.
    comp_names = comptable["Component"].values
    mixing_df = pd.DataFrame(data=mmix, columns=comp_names)
    io_generator.save_file(mixing_df, "ICA mixing tsv")
    betas_oc = utils.unmask(computefeats2(data_oc, mmix, mask_denoise),
                            mask_denoise)
    io_generator.save_file(betas_oc, 'z-scored ICA components img')

    # Save component table and associated json
    io_generator.save_file(comptable, "ICA metrics tsv")
    metric_metadata = metrics.collect.get_metadata(comptable)
    io_generator.save_file(metric_metadata, "ICA metrics json")

    decomp_metadata = {
        "Method": ("Independent components analysis with FastICA "
                   "algorithm implemented by sklearn. "),
    }
    for comp_name in comp_names:
        decomp_metadata[comp_name] = {
            "Description":
            "ICA fit to dimensionally-reduced optimally combined data.",
            "Method": "tedana",
        }
    with open(io_generator.get_name("ICA decomposition json"), "w") as fo:
        json.dump(decomp_metadata, fo, sort_keys=True, indent=4)

    if comptable[comptable.classification == 'accepted'].shape[0] == 0:
        LGR.warning('No BOLD components detected! Please check data and '
                    'results!')

    mmix_orig = mmix.copy()
    if tedort:
        acc_idx = comptable.loc[~comptable.classification.str.
                                contains('rejected')].index.values
        rej_idx = comptable.loc[comptable.classification.str.contains(
            'rejected')].index.values
        acc_ts = mmix[:, acc_idx]
        rej_ts = mmix[:, rej_idx]
        betas = np.linalg.lstsq(acc_ts, rej_ts, rcond=None)[0]
        pred_rej_ts = np.dot(acc_ts, betas)
        resid = rej_ts - pred_rej_ts
        mmix[:, rej_idx] = resid
        comp_names = [
            io.add_decomp_prefix(comp,
                                 prefix='ica',
                                 max_value=comptable.index.max())
            for comp in comptable.index.values
        ]
        mixing_df = pd.DataFrame(data=mmix, columns=comp_names)
        io_generator.save_file(mixing_df, "ICA orthogonalized mixing tsv")
        RepLGR.info("Rejected components' time series were then "
                    "orthogonalized with respect to accepted components' time "
                    "series.")

    io.writeresults(data_oc,
                    mask=mask_denoise,
                    comptable=comptable,
                    mmix=mmix,
                    n_vols=n_vols,
                    io_generator=io_generator)

    if 'mir' in gscontrol:
        gsc.minimum_image_regression(data_oc, mmix, mask_denoise, comptable,
                                     io_generator)

    if verbose:
        io.writeresults_echoes(catd, mmix, mask_denoise, comptable,
                               io_generator)

    # Write out BIDS-compatible description file
    derivative_metadata = {
        "Name":
        "tedana Outputs",
        "BIDSVersion":
        "1.5.0",
        "DatasetType":
        "derivative",
        "GeneratedBy": [{
            "Name":
            "tedana",
            "Version":
            __version__,
            "Description":
            ("A denoising pipeline for the identification and removal "
             "of non-BOLD noise from multi-echo fMRI data."),
            "CodeURL":
            "https://github.com/ME-ICA/tedana"
        }]
    }
    with open(io_generator.get_name("data description json"), "w") as fo:
        json.dump(derivative_metadata, fo, sort_keys=True, indent=4)

    RepLGR.info("This workflow used numpy (Van Der Walt, Colbert, & "
                "Varoquaux, 2011), scipy (Jones et al., 2001), pandas "
                "(McKinney, 2010), scikit-learn (Pedregosa et al., 2011), "
                "nilearn, and nibabel (Brett et al., 2019).")
    RefLGR.info(
        "Van Der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The "
        "NumPy array: a structure for efficient numerical computation. "
        "Computing in Science & Engineering, 13(2), 22.")
    RefLGR.info("Jones E, Oliphant E, Peterson P, et al. SciPy: Open Source "
                "Scientific Tools for Python, 2001-, http://www.scipy.org/")
    RefLGR.info("McKinney, W. (2010, June). Data structures for statistical "
                "computing in python. In Proceedings of the 9th Python in "
                "Science Conference (Vol. 445, pp. 51-56).")
    RefLGR.info("Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., "
                "Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). "
                "Scikit-learn: Machine learning in Python. Journal of machine "
                "learning research, 12(Oct), 2825-2830.")
    RefLGR.info("Brett, M., Markiewicz, C. J., Hanke, M., Côté, M.-A., "
                "Cipollini, B., McCarthy, P., … freec84. (2019, May 28). "
                "nipy/nibabel. Zenodo. http://doi.org/10.5281/zenodo.3233118")

    RepLGR.info("This workflow also used the Dice similarity index "
                "(Dice, 1945; Sørensen, 1948).")
    RefLGR.info("Dice, L. R. (1945). Measures of the amount of ecologic "
                "association between species. Ecology, 26(3), 297-302.")
    RefLGR.info(
        "Sørensen, T. J. (1948). A method of establishing groups of "
        "equal amplitude in plant sociology based on similarity of "
        "species content and its application to analyses of the "
        "vegetation on Danish commons. I kommission hos E. Munksgaard.")

    with open(repname, 'r') as fo:
        report = [line.rstrip() for line in fo.readlines()]
        report = ' '.join(report)
    with open(refname, 'r') as fo:
        reference_list = sorted(list(set(fo.readlines())))
        references = '\n'.join(reference_list)
    report += '\n\nReferences:\n\n' + references
    with open(repname, 'w') as fo:
        fo.write(report)

    if not no_reports:
        LGR.info(
            'Making figures folder with static component maps and timecourse plots.'
        )

        dn_ts, hikts, lowkts = io.denoise_ts(data_oc, mmix, mask_denoise,
                                             comptable)

        reporting.static_figures.carpet_plot(
            optcom_ts=data_oc,
            denoised_ts=dn_ts,
            hikts=hikts,
            lowkts=lowkts,
            mask=mask_denoise,
            io_generator=io_generator,
            gscontrol=gscontrol,
        )
        reporting.static_figures.comp_figures(
            data_oc,
            mask=mask_denoise,
            comptable=comptable,
            mmix=mmix_orig,
            io_generator=io_generator,
            png_cmap=png_cmap,
        )

        if sys.version_info.major == 3 and sys.version_info.minor < 6:
            warn_msg = ("Reports requested but Python version is less than "
                        "3.6.0. Dynamic reports will not be generated.")
            LGR.warn(warn_msg)
        else:
            LGR.info('Generating dynamic report')
            reporting.generate_report(io_generator, tr=img_t_r)

    LGR.info('Workflow completed')
    utils.teardown_loggers()
    os.remove(refname)

示例#5

显示文件

文件： pca.py 项目： azitennis50/tedana

def tedpca(data_cat, data_oc, combmode, mask, t2s, t2sG,
           ref_img, tes, algorithm='mdl', source_tes=-1, kdaw=10., rdaw=1.,
           out_dir='.', verbose=False, low_mem=False):
    """
    Use principal components analysis (PCA) to identify and remove thermal
    noise from multi-echo data.

    Parameters
    ----------
    data_cat : (S x E x T) array_like
        Input functional data
    data_oc : (S x T) array_like
        Optimally combined time series data
    combmode : {'t2s', 'paid'} str
        How optimal combination of echos should be made, where 't2s' indicates
        using the method of Posse 1999 and 'paid' indicates using the method of
        Poser 2006
    mask : (S,) array_like
        Boolean mask array
    t2s : (S,) array_like
        Map of voxel-wise T2* estimates.
    t2sG : (S,) array_like
        Map of voxel-wise T2* estimates.
    ref_img : :obj:`str` or img_like
        Reference image to dictate how outputs are saved to disk
    tes : :obj:`list`
        List of echo times associated with `data_cat`, in milliseconds
    algorithm : {'mle', 'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic'}, optional
        Method with which to select components in TEDPCA. Default is 'mdl'. PCA
        decomposition with the mdl, kic and aic options are based on a Moving Average
        (stationary Gaussian) process and are ordered from most to least aggresive.
        See (Li et al., 2007).
    source_tes : :obj:`int` or :obj:`list` of :obj:`int`, optional
        Which echos to use in PCA. Values -1 and 0 are special, where a value
        of -1 will indicate using the optimal combination of the echos
        and 0  will indicate using all the echos. A list can be provided
        to indicate a subset of echos.
        Default: -1
    kdaw : :obj:`float`, optional
        Dimensionality augmentation weight for Kappa calculations. Must be a
        non-negative float, or -1 (a special value). Default is 10.
    rdaw : :obj:`float`, optional
        Dimensionality augmentation weight for Rho calculations. Must be a
        non-negative float, or -1 (a special value). Default is 1.
    out_dir : :obj:`str`, optional
        Output directory.
    verbose : :obj:`bool`, optional
        Whether to output files from fitmodels_direct or not. Default: False
    low_mem : :obj:`bool`, optional
        Whether to use incremental PCA (for low-memory systems) or not.
        Default: False

    Returns
    -------
    kept_data : (S x T) :obj:`numpy.ndarray`
        Dimensionally reduced optimally combined functional data
    n_components : :obj:`int`
        Number of components retained from PCA decomposition

    Notes
    -----
    ======================    =================================================
    Notation                  Meaning
    ======================    =================================================
    :math:`\\kappa`            Component pseudo-F statistic for TE-dependent
                              (BOLD) model.
    :math:`\\rho`              Component pseudo-F statistic for TE-independent
                              (artifact) model.
    :math:`v`                 Voxel
    :math:`V`                 Total number of voxels in mask
    :math:`\\zeta`             Something
    :math:`c`                 Component
    :math:`p`                 Something else
    ======================    =================================================

    Steps:

    1.  Variance normalize either multi-echo or optimally combined data,
        depending on settings.
    2.  Decompose normalized data using PCA or SVD.
    3.  Compute :math:`{\\kappa}` and :math:`{\\rho}`:

            .. math::
                {\\kappa}_c = \\frac{\\sum_{v}^V {\\zeta}_{c,v}^p * \
                      F_{c,v,R_2^*}}{\\sum {\\zeta}_{c,v}^p}

                {\\rho}_c = \\frac{\\sum_{v}^V {\\zeta}_{c,v}^p * \
                      F_{c,v,S_0}}{\\sum {\\zeta}_{c,v}^p}

    4.  Some other stuff. Something about elbows.
    5.  Classify components as thermal noise if they meet both of the
        following criteria:

            - Nonsignificant :math:`{\\kappa}` and :math:`{\\rho}`.
            - Nonsignificant variance explained.

    Outputs:

    This function writes out several files:

    ======================    =================================================
    Filename                  Content
    ======================    =================================================
    pca_decomposition.json    PCA component table.
    pca_mixing.tsv            PCA mixing matrix.
    pca_components.nii.gz     Component weight maps.
    ======================    =================================================
    """
    if low_mem and algorithm == 'mle':
        LGR.warning('Low memory option is not compatible with MLE '
                    'dimensionality estimation. Switching to Kundu decision '
                    'tree.')
        algorithm = 'kundu'

    if algorithm == 'mle':
        alg_str = "using MLE dimensionality estimation (Minka, 2001)"
        RefLGR.info("Minka, T. P. (2001). Automatic choice of dimensionality "
                    "for PCA. In Advances in neural information processing "
                    "systems (pp. 598-604).")
    elif algorithm == 'kundu':
        alg_str = ("followed by the Kundu component selection decision "
                   "tree (Kundu et al., 2013)")
        RefLGR.info("Kundu, P., Brenowitz, N. D., Voon, V., Worbe, Y., "
                    "Vértes, P. E., Inati, S. J., ... & Bullmore, E. T. "
                    "(2013). Integrated strategy for improving functional "
                    "connectivity mapping using multiecho fMRI. Proceedings "
                    "of the National Academy of Sciences, 110(40), "
                    "16187-16192.")
    elif algorithm == 'kundu-stabilize':
        alg_str = ("followed by the 'stabilized' Kundu component "
                   "selection decision tree (Kundu et al., 2013)")
        RefLGR.info("Kundu, P., Brenowitz, N. D., Voon, V., Worbe, Y., "
                    "Vértes, P. E., Inati, S. J., ... & Bullmore, E. T. "
                    "(2013). Integrated strategy for improving functional "
                    "connectivity mapping using multiecho fMRI. Proceedings "
                    "of the National Academy of Sciences, 110(40), "
                    "16187-16192.")
    else:
        alg_str = ("based on the PCA component estimation with a Moving Average"
                   "(stationary Gaussian) process (Li et al., 2007)")
        RefLGR.info("Li, Y.O., Adalı, T. and Calhoun, V.D., (2007). "
                    "Estimating the number of independent components for "
                    "functional magnetic resonance imaging data. "
                    "Human brain mapping, 28(11), pp.1251-1266.")

    if source_tes == -1:
        dat_str = "the optimally combined data"
    elif source_tes == 0:
        dat_str = "the z-concatenated multi-echo data"
    else:
        dat_str = "a z-concatenated subset of echoes from the input data"

    RepLGR.info("Principal component analysis {0} was applied to "
                "{1} for dimensionality reduction.".format(alg_str, dat_str))

    n_samp, n_echos, n_vols = data_cat.shape
    source_tes = np.array([int(ee) for ee in str(source_tes).split(',')])

    if len(source_tes) == 1 and source_tes[0] == -1:
        LGR.info('Computing PCA of optimally combined multi-echo data')
        data = data_oc[mask, :][:, np.newaxis, :]
    elif len(source_tes) == 1 and source_tes[0] == 0:
        LGR.info('Computing PCA of spatially concatenated multi-echo data')
        data = data_cat[mask, ...]
    else:
        LGR.info('Computing PCA of echo #{0}'.format(','.join([str(ee) for ee in source_tes])))
        data = np.stack([data_cat[mask, ee, :] for ee in source_tes - 1], axis=1)

    eim = np.squeeze(_utils.eimask(data))
    data = np.squeeze(data[eim])

    data_z = ((data.T - data.T.mean(axis=0)) / data.T.std(axis=0)).T  # var normalize ts
    data_z = (data_z - data_z.mean()) / data_z.std()  # var normalize everything

    if algorithm in ['mdl', 'aic', 'kic']:
        data_img = io.new_nii_like(
            ref_img, utils.unmask(utils.unmask(data, eim), mask))
        mask_img = io.new_nii_like(ref_img,
                                   utils.unmask(eim, mask).astype(int))
        voxel_comp_weights, varex, varex_norm, comp_ts = ma_pca.ma_pca(
            data_img, mask_img, algorithm)
    elif algorithm == 'mle':
        voxel_comp_weights, varex, varex_norm, comp_ts = run_mlepca(data_z)
    elif low_mem:
        voxel_comp_weights, varex, comp_ts = low_mem_pca(data_z)
        varex_norm = varex / varex.sum()
    else:
        ppca = PCA(copy=False, n_components=(n_vols - 1))
        ppca.fit(data_z)
        comp_ts = ppca.components_.T
        varex = ppca.explained_variance_
        voxel_comp_weights = np.dot(np.dot(data_z, comp_ts),
                                    np.diag(1. / varex))
        varex_norm = varex / varex.sum()

    # Compute Kappa and Rho for PCA comps
    eimum = np.atleast_2d(eim)
    eimum = np.transpose(eimum, np.argsort(eimum.shape)[::-1])
    eimum = eimum.prod(axis=1)
    o = np.zeros((mask.shape[0], *eimum.shape[1:]))
    o[mask, ...] = eimum
    eimum = np.squeeze(o).astype(bool)

    # Normalize each component's time series
    vTmixN = stats.zscore(comp_ts, axis=0)
    comptable, _, _, _ = metrics.dependence_metrics(data_cat,
                                                    data_oc,
                                                    comp_ts,
                                                    t2s,
                                                    tes,
                                                    ref_img,
                                                    reindex=False,
                                                    mmixN=vTmixN,
                                                    algorithm=None,
                                                    label='mepca_',
                                                    out_dir=out_dir,
                                                    verbose=verbose)

    # varex_norm from PCA overrides varex_norm from dependence_metrics,
    # but we retain the original
    comptable['estimated normalized variance explained'] = \
        comptable['normalized variance explained']
    comptable['normalized variance explained'] = varex_norm

    # write component maps to 4D image
    comp_ts_z = stats.zscore(comp_ts, axis=0)
    comp_maps = utils.unmask(computefeats2(data_oc, comp_ts_z, mask), mask)
    io.filewrite(comp_maps, op.join(out_dir, 'pca_components.nii.gz'), ref_img)

    # Select components using decision tree
    if algorithm == 'kundu':
        comptable = kundu_tedpca(comptable, n_echos, kdaw, rdaw, stabilize=False)
    elif algorithm == 'kundu-stabilize':
        comptable = kundu_tedpca(comptable, n_echos, kdaw, rdaw, stabilize=True)
    elif algorithm == 'mle':
        LGR.info('Selected {0} components with MLE dimensionality '
                 'detection'.format(comptable.shape[0]))
        comptable['classification'] = 'accepted'
        comptable['rationale'] = ''

    elif algorithm in ['mdl', 'aic', 'kic']:
        LGR.info('Selected {0} components with {1} dimensionality '
                 'detection'.format(comptable.shape[0], algorithm))
        comptable['classification'] = 'accepted'
        comptable['rationale'] = ''

    # Save decomposition
    comp_names = [io.add_decomp_prefix(comp, prefix='pca', max_value=comptable.index.max())
                  for comp in comptable.index.values]

    mixing_df = pd.DataFrame(data=comp_ts, columns=comp_names)
    mixing_df.to_csv(op.join(out_dir, 'pca_mixing.tsv'), sep='\t', index=False)

    data_type = 'optimally combined data' if source_tes == -1 else 'z-concatenated data'
    comptable['Description'] = 'PCA fit to {0}.'.format(data_type)
    mmix_dict = {}
    mmix_dict['Method'] = ('Principal components analysis implemented by '
                           'sklearn. Components are sorted by variance '
                           'explained in descending order. '
                           'Component signs are flipped to best match the '
                           'data.')
    io.save_comptable(comptable, op.join(out_dir, 'pca_decomposition.json'),
                      label='pca', metadata=mmix_dict)

    acc = comptable[comptable.classification == 'accepted'].index.values
    n_components = acc.size
    voxel_kept_comp_weighted = (voxel_comp_weights[:, acc] * varex[None, acc])
    kept_data = np.dot(voxel_kept_comp_weighted, comp_ts[:, acc].T)

    kept_data = stats.zscore(kept_data, axis=1)  # variance normalize time series
    kept_data = stats.zscore(kept_data, axis=None)  # variance normalize everything

    return kept_data, n_components

示例#6

显示文件

文件： pca.py 项目： handwerkerd/tedana

def tedpca(
    data_cat,
    data_oc,
    combmode,
    mask,
    adaptive_mask,
    t2sG,
    io_generator,
    tes,
    algorithm="aic",
    kdaw=10.0,
    rdaw=1.0,
    verbose=False,
    low_mem=False,
):
    """
    Use principal components analysis (PCA) to identify and remove thermal
    noise from multi-echo data.

    Parameters
    ----------
    data_cat : (S x E x T) array_like
        Input functional data
    data_oc : (S x T) array_like
        Optimally combined time series data
    combmode : {'t2s', 'paid'} str
        How optimal combination of echos should be made, where 't2s' indicates
        using the method of Posse 1999 and 'paid' indicates using the method of
        Poser 2006
    mask : (S,) array_like
        Boolean mask array
    adaptive_mask : (S,) array_like
        Array where each value indicates the number of echoes with good signal
        for that voxel. This mask may be thresholded; for example, with values
        less than 3 set to 0.
        For more information on thresholding, see `make_adaptive_mask`.
    t2sG : (S,) array_like
        Map of voxel-wise T2* estimates.
    io_generator : :obj:`tedana.io.OutputGenerator`
        The output generation object for this workflow
    tes : :obj:`list`
        List of echo times associated with `data_cat`, in milliseconds
    algorithm : {'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic', float}, optional
        Method with which to select components in TEDPCA. PCA
        decomposition with the mdl, kic and aic options are based on a Moving Average
        (stationary Gaussian) process and are ordered from most to least aggressive
        (see Li et al., 2007).
        If a float is provided, then it is assumed to represent percentage of variance
        explained (0-1) to retain from PCA.
        If an int is provided, then it is assumed to be the number of components
        to select
        Default is 'aic'.
    kdaw : :obj:`float`, optional
        Dimensionality augmentation weight for Kappa calculations. Must be a
        non-negative float, or -1 (a special value). Default is 10.
    rdaw : :obj:`float`, optional
        Dimensionality augmentation weight for Rho calculations. Must be a
        non-negative float, or -1 (a special value). Default is 1.
    verbose : :obj:`bool`, optional
        Whether to output files from fitmodels_direct or not. Default: False
    low_mem : :obj:`bool`, optional
        Whether to use incremental PCA (for low-memory systems) or not.
        This is only compatible with the "kundu" or "kundu-stabilize" algorithms.
        Default: False

    Returns
    -------
    kept_data : (S x T) :obj:`numpy.ndarray`
        Dimensionally reduced optimally combined functional data
    n_components : :obj:`int`
        Number of components retained from PCA decomposition

    Notes
    -----
    ======================    =================================================
    Notation                  Meaning
    ======================    =================================================
    :math:`\\kappa`            Component pseudo-F statistic for TE-dependent
                              (BOLD) model.
    :math:`\\rho`              Component pseudo-F statistic for TE-independent
                              (artifact) model.
    :math:`v`                 Voxel
    :math:`V`                 Total number of voxels in mask
    :math:`\\zeta`             Something
    :math:`c`                 Component
    :math:`p`                 Something else
    ======================    =================================================

    Steps:

    1.  Variance normalize either multi-echo or optimally combined data,
        depending on settings.
    2.  Decompose normalized data using PCA or SVD.
    3.  Compute :math:`{\\kappa}` and :math:`{\\rho}`:

            .. math::
                {\\kappa}_c = \\frac{\\sum_{v}^V {\\zeta}_{c,v}^p * \
                      F_{c,v,R_2^*}}{\\sum {\\zeta}_{c,v}^p}

                {\\rho}_c = \\frac{\\sum_{v}^V {\\zeta}_{c,v}^p * \
                      F_{c,v,S_0}}{\\sum {\\zeta}_{c,v}^p}

    4.  Some other stuff. Something about elbows.
    5.  Classify components as thermal noise if they meet both of the
        following criteria:

            - Nonsignificant :math:`{\\kappa}` and :math:`{\\rho}`.
            - Nonsignificant variance explained.

    Outputs:

    This function writes out several files:

    ===========================    =============================================
    Default Filename               Content
    ===========================    =============================================
    desc-PCA_metrics.tsv           PCA component table
    desc-PCA_metrics.json          Metadata sidecar file describing the
                                   component table
    desc-PCA_mixing.tsv            PCA mixing matrix
    desc-PCA_components.nii.gz     Component weight maps
    desc-PCA_decomposition.json    Metadata sidecar file describing the PCA
                                   decomposition
    ===========================    =============================================

    See Also
    --------
    :func:`tedana.utils.make_adaptive_mask` : The function used to create
        the ``adaptive_mask`` parameter.
    :py:mod:`tedana.constants` : The module describing the filenames for
        various naming conventions
    """
    if algorithm == "kundu":
        alg_str = "followed by the Kundu component selection decision tree (Kundu et al., 2013)"
        RefLGR.info(
            "Kundu, P., Brenowitz, N. D., Voon, V., Worbe, Y., "
            "Vértes, P. E., Inati, S. J., ... & Bullmore, E. T. "
            "(2013). Integrated strategy for improving functional "
            "connectivity mapping using multiecho fMRI. Proceedings "
            "of the National Academy of Sciences, 110(40), "
            "16187-16192."
        )
    elif algorithm == "kundu-stabilize":
        alg_str = (
            "followed by the 'stabilized' Kundu component "
            "selection decision tree (Kundu et al., 2013)"
        )
        RefLGR.info(
            "Kundu, P., Brenowitz, N. D., Voon, V., Worbe, Y., "
            "Vértes, P. E., Inati, S. J., ... & Bullmore, E. T. "
            "(2013). Integrated strategy for improving functional "
            "connectivity mapping using multiecho fMRI. Proceedings "
            "of the National Academy of Sciences, 110(40), "
            "16187-16192."
        )
    elif isinstance(algorithm, Number):
        if isinstance(algorithm, float):
            alg_str = (
                "in which the number of components was determined based on a "
                "variance explained threshold"
            )
        else:
            alg_str = "in which the number of components is pre-defined"
    else:
        alg_str = (
            "based on the PCA component estimation with a Moving Average"
            "(stationary Gaussian) process (Li et al., 2007)"
        )
        RefLGR.info(
            "Li, Y.O., Adalı, T. and Calhoun, V.D., (2007). "
            "Estimating the number of independent components for "
            "functional magnetic resonance imaging data. "
            "Human brain mapping, 28(11), pp.1251-1266."
        )

    RepLGR.info(
        "Principal component analysis {0} was applied to "
        "the optimally combined data for dimensionality "
        "reduction.".format(alg_str)
    )

    n_samp, n_echos, n_vols = data_cat.shape

    LGR.info(
        f"Computing PCA of optimally combined multi-echo data with selection criteria: {algorithm}"
    )
    data = data_oc[mask, :]

    data_z = ((data.T - data.T.mean(axis=0)) / data.T.std(axis=0)).T  # var normalize ts
    data_z = (data_z - data_z.mean()) / data_z.std()  # var normalize everything

    if algorithm in ["mdl", "aic", "kic"]:
        data_img = io.new_nii_like(io_generator.reference_img, utils.unmask(data, mask))
        mask_img = io.new_nii_like(io_generator.reference_img, mask.astype(int))
        ma_pca = MovingAveragePCA(criterion=algorithm, normalize=True)
        _ = ma_pca.fit_transform(data_img, mask_img)

        # Extract results from maPCA
        voxel_comp_weights = ma_pca.u_
        varex = ma_pca.explained_variance_
        varex_norm = ma_pca.explained_variance_ratio_
        comp_ts = ma_pca.components_.T
        aic = ma_pca.aic_
        kic = ma_pca.kic_
        mdl = ma_pca.mdl_
        varex_90 = ma_pca.varexp_90_
        varex_95 = ma_pca.varexp_95_
        all_comps = ma_pca.all_

        # Extract number of components and variance explained for logging and plotting
        n_aic = aic["n_components"]
        aic_varexp = np.round(aic["explained_variance_total"], 3)
        n_kic = kic["n_components"]
        kic_varexp = np.round(kic["explained_variance_total"], 3)
        n_mdl = mdl["n_components"]
        mdl_varexp = np.round(mdl["explained_variance_total"], 3)
        n_varex_90 = varex_90["n_components"]
        varex_90_varexp = np.round(varex_90["explained_variance_total"], 3)
        n_varex_95 = varex_95["n_components"]
        varex_95_varexp = np.round(varex_95["explained_variance_total"], 3)
        all_varex = np.round(all_comps["explained_variance_total"], 3)

        # Print out the results
        LGR.info("Optimal number of components based on different criteria:")
        LGR.info(
            f"AIC: {n_aic} | KIC: {n_kic} | MDL: {n_mdl} | 90% varexp: {n_varex_90} "
            f"| 95% varexp: {n_varex_95}"
        )

        LGR.info("Explained variance based on different criteria:")
        LGR.info(
            f"AIC: {aic_varexp}% | KIC: {kic_varexp}% | MDL: {mdl_varexp}% | "
            f"90% varexp: {varex_90_varexp}% | 95% varexp: {varex_95_varexp}%"
        )

        pca_optimization_curves = np.array([aic["value"], kic["value"], mdl["value"]])
        pca_criteria_components = np.array(
            [
                n_aic,
                n_kic,
                n_mdl,
                n_varex_90,
                n_varex_95,
            ]
        )

        # Plot maPCA optimization curves
        LGR.info("Plotting maPCA optimization curves")
        plot_pca_results(pca_optimization_curves, pca_criteria_components, all_varex, io_generator)

        # Save maPCA results into a dictionary
        mapca_results = {
            "aic": {
                "n_components": n_aic,
                "explained_variance_total": aic_varexp,
                "curve": aic["value"],
            },
            "kic": {
                "n_components": n_kic,
                "explained_variance_total": kic_varexp,
                "curve": kic["value"],
            },
            "mdl": {
                "n_components": n_mdl,
                "explained_variance_total": mdl_varexp,
                "curve": mdl["value"],
            },
            "varex_90": {
                "n_components": n_varex_90,
                "explained_variance_total": varex_90_varexp,
            },
            "varex_95": {
                "n_components": n_varex_95,
                "explained_variance_total": varex_95_varexp,
            },
        }

        # Save dictionary
        io_generator.save_file(mapca_results, "PCA cross component metrics json")

    elif isinstance(algorithm, Number):
        ppca = PCA(copy=False, n_components=algorithm, svd_solver="full")
        ppca.fit(data_z)
        comp_ts = ppca.components_.T
        varex = ppca.explained_variance_
        voxel_comp_weights = np.dot(np.dot(data_z, comp_ts), np.diag(1.0 / varex))
        varex_norm = ppca.explained_variance_ratio_
    elif low_mem:
        voxel_comp_weights, varex, varex_norm, comp_ts = low_mem_pca(data_z)
    else:
        ppca = PCA(copy=False, n_components=(n_vols - 1))
        ppca.fit(data_z)
        comp_ts = ppca.components_.T
        varex = ppca.explained_variance_
        voxel_comp_weights = np.dot(np.dot(data_z, comp_ts), np.diag(1.0 / varex))
        varex_norm = ppca.explained_variance_ratio_

    # Compute Kappa and Rho for PCA comps
    required_metrics = [
        "kappa",
        "rho",
        "countnoise",
        "countsigFT2",
        "countsigFS0",
        "dice_FT2",
        "dice_FS0",
        "signal-noise_t",
        "variance explained",
        "normalized variance explained",
        "d_table_score",
    ]
    comptable = metrics.collect.generate_metrics(
        data_cat,
        data_oc,
        comp_ts,
        adaptive_mask,
        tes,
        io_generator,
        "PCA",
        metrics=required_metrics,
    )

    # varex_norm from PCA overrides varex_norm from dependence_metrics,
    # but we retain the original
    comptable["estimated normalized variance explained"] = comptable[
        "normalized variance explained"
    ]
    comptable["normalized variance explained"] = varex_norm

    # write component maps to 4D image
    comp_maps = utils.unmask(computefeats2(data_oc, comp_ts, mask), mask)
    io_generator.save_file(comp_maps, "z-scored PCA components img")

    # Select components using decision tree
    if algorithm == "kundu":
        comptable, metric_metadata = kundu_tedpca(
            comptable,
            n_echos,
            kdaw,
            rdaw,
            stabilize=False,
        )
    elif algorithm == "kundu-stabilize":
        comptable, metric_metadata = kundu_tedpca(
            comptable,
            n_echos,
            kdaw,
            rdaw,
            stabilize=True,
        )
    else:
        if isinstance(algorithm, float):
            alg_str = "variance explained-based"
        elif isinstance(algorithm, int):
            alg_str = "a fixed number of components and no"
        else:
            alg_str = algorithm
        LGR.info(
            f"Selected {comptable.shape[0]} components with {round(100*varex_norm.sum(),2)}% "
            f"normalized variance explained using {alg_str} dimensionality detection"
        )
        comptable["classification"] = "accepted"
        comptable["rationale"] = ""

    # Save decomposition files
    comp_names = [
        io.add_decomp_prefix(comp, prefix="pca", max_value=comptable.index.max())
        for comp in comptable.index.values
    ]

    mixing_df = pd.DataFrame(data=comp_ts, columns=comp_names)
    io_generator.save_file(mixing_df, "PCA mixing tsv")

    # Save component table and associated json
    io_generator.save_file(comptable, "PCA metrics tsv")

    metric_metadata = metrics.collect.get_metadata(comptable)
    io_generator.save_file(metric_metadata, "PCA metrics json")

    decomp_metadata = {
        "Method": (
            "Principal components analysis implemented by sklearn. "
            "Components are sorted by variance explained in descending order. "
        ),
    }
    for comp_name in comp_names:
        decomp_metadata[comp_name] = {
            "Description": "PCA fit to optimally combined data.",
            "Method": "tedana",
        }
    io_generator.save_file(decomp_metadata, "PCA decomposition json")

    acc = comptable[comptable.classification == "accepted"].index.values
    n_components = acc.size
    voxel_kept_comp_weighted = voxel_comp_weights[:, acc] * varex[None, acc]
    kept_data = np.dot(voxel_kept_comp_weighted, comp_ts[:, acc].T)

    kept_data = stats.zscore(kept_data, axis=1)  # variance normalize time series
    kept_data = stats.zscore(kept_data, axis=None)  # variance normalize everything

    return kept_data, n_components