def extract_outlier_frames(config,
                           videos,
                           videotype='avi',
                           shuffle=1,
                           trainingsetindex=0,
                           outlieralgorithm='jump',
                           comparisonbodyparts='all',
                           epsilon=20,
                           p_bound=.01,
                           ARdegree=3,
                           MAdegree=1,
                           alpha=.01,
                           extractionalgorithm='kmeans',
                           automatic=False,
                           cluster_resizewidth=30,
                           cluster_color=False,
                           opencv=True,
                           savelabeled=True,
                           destfolder=None):
    """
    Extracts the outlier frames in case, the predictions are not correct for a certain video from the cropped video running from
    start to stop as defined in config.yaml.

    Another crucial parameter in config.yaml is how many frames to extract 'numframes2extract'.

    Parameter
    ----------
    config : string
        Full path of the config.yaml file as a string.

    videos : list
        A list of strings containing the full paths to videos for analysis or a path to the directory, where all the videos with same extension are stored.

    videotype: string, optional
        Checks for the extension of the video in case the input to the video is a directory.\n Only videos with this extension are analyzed. The default is ``.avi``

    shuffle : int, optional
        The shufle index of training dataset. The extracted frames will be stored in the labeled-dataset for
        the corresponding shuffle of training dataset. Default is set to 1

    trainingsetindex: int, optional
        Integer specifying which TrainingsetFraction to use. By default the first (note that TrainingFraction is a list in config.yaml).

    outlieralgorithm: 'fitting', 'jump', 'uncertain', or 'manual'
        String specifying the algorithm used to detect the outliers. Currently, deeplabcut supports three methods + a manual GUI option. 'Fitting'
        fits a Auto Regressive Integrated Moving Average model to the data and computes the distance to the estimated data. Larger distances than
        epsilon are then potentially identified as outliers. The methods 'jump' identifies larger jumps than 'epsilon' in any body part; and 'uncertain'
        looks for frames with confidence below p_bound. The default is set to ``jump``.

    comparisonbodyparts: list of strings, optional
        This select the body parts for which the comparisons with the outliers are carried out. Either ``all``, then all body parts
        from config.yaml are used orr a list of strings that are a subset of the full list.
        E.g. ['hand','Joystick'] for the demo Reaching-Mackenzie-2018-08-30/config.yaml to select only these two body parts.

    p_bound: float between 0 and 1, optional
        For outlieralgorithm 'uncertain' this parameter defines the likelihood below, below which a body part will be flagged as a putative outlier.

    epsilon; float,optional
        Meaning depends on outlieralgoritm. The default is set to 20 pixels.
        For outlieralgorithm 'fitting': Float bound according to which frames are picked when the (average) body part estimate deviates from model fit
        For outlieralgorithm 'jump': Float bound specifying the distance by which body points jump from one frame to next (Euclidean distance)

    ARdegree: int, optional
        For outlieralgorithm 'fitting': Autoregressive degree of ARIMA model degree. (Note we use SARIMAX without exogeneous and seasonal part)
        see https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

    MAdegree: int
        For outlieralgorithm 'fitting': MovingAvarage degree of ARIMA model degree. (Note we use SARIMAX without exogeneous and seasonal part)
        See https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

    alpha: float
        Significance level for detecting outliers based on confidence interval of fitted ARIMA model. Only the distance is used however.

    extractionalgorithm : string, optional
        String specifying the algorithm to use for selecting the frames from the identified putatative outlier frames. Currently, deeplabcut
        supports either ``kmeans`` or ``uniform`` based selection (same logic as for extract_frames).
        The default is set to``uniform``, if provided it must be either ``uniform`` or ``kmeans``.

    automatic : bool, optional
        Set it to True, if you want to extract outliers without being asked for user feedback.

    cluster_resizewidth: number, default: 30
        For k-means one can change the width to which the images are downsampled (aspect ratio is fixed).

    cluster_color: bool, default: False
        If false then each downsampled image is treated as a grayscale vector (discarding color information). If true, then the color channels are considered. This increases
        the computational complexity.

    opencv: bool, default: True
        Uses openCV for loading & extractiong (otherwise moviepy (legacy))

    savelabeled: bool, default: True
        If true also saves frame with predicted labels in each folder.

    destfolder: string, optional
        Specifies the destination folder that was used for storing analysis data (default is the path of the video).

    Examples

    Windows example for extracting the frames with default settings
    >>> deeplabcut.extract_outlier_frames('C:\\myproject\\reaching-task\\config.yaml',['C:\\yourusername\\rig-95\\Videos\\reachingvideo1.avi'])
    --------
    for extracting the frames with default settings
    >>> deeplabcut.extract_outlier_frames('/analysis/project/reaching-task/config.yaml',['/analysis/project/video/reachinvideo1.avi'])
    --------
    for extracting the frames with kmeans
    >>> deeplabcut.extract_outlier_frames('/analysis/project/reaching-task/config.yaml',['/analysis/project/video/reachinvideo1.avi'],extractionalgorithm='kmeans')
    --------
    for extracting the frames with kmeans and epsilon = 5 pixels.
    >>> deeplabcut.extract_outlier_frames('/analysis/project/reaching-task/config.yaml',['/analysis/project/video/reachinvideo1.avi'],epsilon = 5,extractionalgorithm='kmeans')
    --------
    """

    cfg = auxiliaryfunctions.read_config(config)
    DLCscorer, DLCscorerlegacy = auxiliaryfunctions.GetScorerName(
        cfg, shuffle, trainFraction=cfg['TrainingFraction'][trainingsetindex])
    Videos = auxiliaryfunctions.Getlistofvideos(videos, videotype)
    for video in Videos:
        if destfolder is None:
            videofolder = str(Path(video).parents[0])
        else:
            videofolder = destfolder

        notanalyzed, dataname, DLCscorer = auxiliaryfunctions.CheckifNotAnalyzed(
            videofolder,
            str(Path(video).stem),
            DLCscorer,
            DLCscorerlegacy,
            flag='checking')
        if notanalyzed:
            print(
                "It seems the video has not been analyzed yet, or the video is not found! You can only refine the labels after the a video is analyzed. Please run 'analyze_video' first. Or, please double check your video file path"
            )
        else:
            Dataframe = pd.read_hdf(dataname, 'df_with_missing')
            scorer = Dataframe.columns.get_level_values(0)[
                0]  #reading scorer from
            nframes = np.size(Dataframe.index)
            # extract min and max index based on start stop interval.
            startindex = max([int(np.floor(nframes * cfg['start'])), 0])
            stopindex = min([int(np.ceil(nframes * cfg['stop'])), nframes])
            Index = np.arange(stopindex - startindex) + startindex
            #figure out body part list:
            bodyparts = auxiliaryfunctions.IntersectionofBodyPartsandOnesGivenbyUser(
                cfg, comparisonbodyparts)

            Indices = []
            if outlieralgorithm == 'uncertain':  #necessary parameters: considered body parts and
                for bpindex, bp in enumerate(bodyparts):
                    if bp in cfg[
                            'bodyparts']:  #filter [who knows what users put in...]
                        p = Dataframe[scorer][bp]['likelihood'].values[Index]
                        Indices.extend(
                            np.where(p < p_bound)[0] + startindex
                        )  # all indices between start and stop that are below p_bound.

            elif outlieralgorithm == 'jump':
                for bpindex, bp in enumerate(bodyparts):
                    if bp in cfg[
                            'bodyparts']:  #filter [who knows what users put in...]
                        dx = np.diff(Dataframe[scorer][bp]['x'].values[Index])
                        dy = np.diff(Dataframe[scorer][bp]['y'].values[Index])
                        # all indices between start and stop with jump larger than epsilon (leading up to this point!)
                        Indices.extend(
                            np.where((dx**2 + dy**2) > epsilon**2)[0] +
                            startindex + 1)
            elif outlieralgorithm == 'fitting':
                #deviation_dataname = str(Path(videofolder)/Path(dataname))
                # Calculate deviatons for video
                [d, o] = ComputeDeviations(Dataframe, cfg, bodyparts, scorer,
                                           dataname, p_bound, alpha, ARdegree,
                                           MAdegree)

                #Some heuristics for extracting frames based on distance:
                Indices = np.where(
                    d > epsilon
                )[0]  # time points with at least average difference of epsilon

                if len(Index) < cfg['numframes2pick'] * 2 and len(d) > cfg[
                        'numframes2pick'] * 2:  # if too few points qualify, extract the most distant ones.
                    Indices = np.argsort(d)[::-1][:cfg['numframes2pick'] * 2]
            elif outlieralgorithm == 'manual':
                wd = Path(config).resolve().parents[0]
                os.chdir(str(wd))
                from deeplabcut.refine_training_dataset import outlier_frame_extraction_toolbox

                outlier_frame_extraction_toolbox.show(config, video, shuffle,
                                                      Dataframe, scorer,
                                                      savelabeled)
            # Run always except when the outlieralgorithm == manual.
            if not outlieralgorithm == 'manual':
                Indices = np.sort(list(set(Indices)))  #remove repetitions.
                print("Method ", outlieralgorithm, " found ", len(Indices),
                      " putative outlier frames.")
                print("Do you want to proceed with extracting ",
                      cfg['numframes2pick'], " of those?")
                if outlieralgorithm == 'uncertain':
                    print(
                        "If this list is very large, perhaps consider changing the paramters (start, stop, p_bound, comparisonbodyparts) or use a different method."
                    )
                elif outlieralgorithm == 'jump':
                    print(
                        "If this list is very large, perhaps consider changing the paramters (start, stop, epsilon, comparisonbodyparts) or use a different method."
                    )
                elif outlieralgorithm == 'fitting':
                    print(
                        "If this list is very large, perhaps consider changing the paramters (start, stop, epsilon, ARdegree, MAdegree, alpha, comparisonbodyparts) or use a different method."
                    )

                if automatic == False:
                    askuser = input("yes/no")
                else:
                    askuser = '******'

                if askuser == 'y' or askuser == 'yes' or askuser == 'Ja' or askuser == 'ha':  # multilanguage support :)
                    #Now extract from those Indices!
                    ExtractFramesbasedonPreselection(
                        Indices, extractionalgorithm, Dataframe, dataname,
                        scorer, video, cfg, config, opencv,
                        cluster_resizewidth, cluster_color, savelabeled)
                else:
                    print(
                        "Nothing extracted, please change the parameters and start again..."
                    )
Beispiel #2
0
def extract_outlier_frames(
    config,
    videos,
    videotype=".avi",
    shuffle=1,
    trainingsetindex=0,
    outlieralgorithm="jump",
    comparisonbodyparts="all",
    epsilon=20,
    p_bound=0.01,
    ARdegree=3,
    MAdegree=1,
    alpha=0.01,
    extractionalgorithm="kmeans",
    automatic=False,
    cluster_resizewidth=30,
    cluster_color=False,
    opencv=True,
    savelabeled=False,
    destfolder=None,
    modelprefix="",
    track_method="",
):
    """
    Extracts the outlier frames in case, the predictions are not correct for a certain video from the cropped video running from
    start to stop as defined in config.yaml.

    Another crucial parameter in config.yaml is how many frames to extract 'numframes2extract'.

    Parameter
    ----------
    config : string
        Full path of the config.yaml file as a string.

    videos : list
        A list of strings containing the full paths to videos for analysis or a path to the directory, where all the videos with same extension are stored.

    videotype: string, optional
        Checks for the extension of the video in case the input to the video is a directory.\n Only videos with this extension are analyzed. The default is ``.avi``

    shuffle : int, optional
        The shufle index of training dataset. The extracted frames will be stored in the labeled-dataset for
        the corresponding shuffle of training dataset. Default is set to 1

    trainingsetindex: int, optional
        Integer specifying which TrainingsetFraction to use. By default the first (note that TrainingFraction is a list in config.yaml).

    outlieralgorithm: 'fitting', 'jump', 'uncertain', or 'manual'
        String specifying the algorithm used to detect the outliers. Currently, deeplabcut supports three methods + a manual GUI option. 'Fitting'
        fits a Auto Regressive Integrated Moving Average model to the data and computes the distance to the estimated data. Larger distances than
        epsilon are then potentially identified as outliers. The methods 'jump' identifies larger jumps than 'epsilon' in any body part; and 'uncertain'
        looks for frames with confidence below p_bound. The default is set to ``jump``.

    comparisonbodyparts: list of strings, optional
        This selects the body parts for which the comparisons with the outliers are carried out. Either ``all``, then all body parts
        from config.yaml are used orr a list of strings that are a subset of the full list.
        E.g. ['hand','Joystick'] for the demo Reaching-Mackenzie-2018-08-30/config.yaml to select only these two body parts.

    p_bound: float between 0 and 1, optional
        For outlieralgorithm 'uncertain' this parameter defines the likelihood below, below which a body part will be flagged as a putative outlier.

    epsilon; float,optional
        Meaning depends on outlieralgoritm. The default is set to 20 pixels.
        For outlieralgorithm 'fitting': Float bound according to which frames are picked when the (average) body part estimate deviates from model fit
        For outlieralgorithm 'jump': Float bound specifying the distance by which body points jump from one frame to next (Euclidean distance)

    ARdegree: int, optional
        For outlieralgorithm 'fitting': Autoregressive degree of ARIMA model degree. (Note we use SARIMAX without exogeneous and seasonal part)
        see https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

    MAdegree: int
        For outlieralgorithm 'fitting': MovingAvarage degree of ARIMA model degree. (Note we use SARIMAX without exogeneous and seasonal part)
        See https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

    alpha: float
        Significance level for detecting outliers based on confidence interval of fitted ARIMA model. Only the distance is used however.

    extractionalgorithm : string, optional
        String specifying the algorithm to use for selecting the frames from the identified putatative outlier frames. Currently, deeplabcut
        supports either ``kmeans`` or ``uniform`` based selection (same logic as for extract_frames).
        The default is set to``uniform``, if provided it must be either ``uniform`` or ``kmeans``.

    automatic : bool, optional
        Set it to True, if you want to extract outliers without being asked for user feedback.

    cluster_resizewidth: number, default: 30
        For k-means one can change the width to which the images are downsampled (aspect ratio is fixed).

    cluster_color: bool, default: False
        If false then each downsampled image is treated as a grayscale vector (discarding color information). If true, then the color channels are considered. This increases
        the computational complexity.

    opencv: bool, default: True
        Uses openCV for loading & extractiong (otherwise moviepy (legacy))

    savelabeled: bool, default: True
        If true also saves frame with predicted labels in each folder.

    destfolder: string, optional
        Specifies the destination folder that was used for storing analysis data (default is the path of the video).

    Examples

    Windows example for extracting the frames with default settings
    >>> deeplabcut.extract_outlier_frames('C:\\myproject\\reaching-task\\config.yaml',['C:\\yourusername\\rig-95\\Videos\\reachingvideo1.avi'])
    --------
    for extracting the frames with default settings
    >>> deeplabcut.extract_outlier_frames('/analysis/project/reaching-task/config.yaml',['/analysis/project/video/reachinvideo1.avi'])
    --------
    for extracting the frames with kmeans
    >>> deeplabcut.extract_outlier_frames('/analysis/project/reaching-task/config.yaml',['/analysis/project/video/reachinvideo1.avi'],extractionalgorithm='kmeans')
    --------
    for extracting the frames with kmeans and epsilon = 5 pixels.
    >>> deeplabcut.extract_outlier_frames('/analysis/project/reaching-task/config.yaml',['/analysis/project/video/reachinvideo1.avi'],epsilon = 5,extractionalgorithm='kmeans')
    --------
    """

    cfg = auxiliaryfunctions.read_config(config)
    bodyparts = auxiliaryfunctions.IntersectionofBodyPartsandOnesGivenbyUser(
        cfg, comparisonbodyparts)
    if not len(bodyparts):
        raise ValueError("No valid bodyparts were selected.")

    DLCscorer, DLCscorerlegacy = auxiliaryfunctions.GetScorerName(
        cfg,
        shuffle,
        trainFraction=cfg["TrainingFraction"][trainingsetindex],
        modelprefix=modelprefix,
    )

    Videos = auxiliaryfunctions.Getlistofvideos(videos, videotype)
    if len(Videos) == 0:
        print("No suitable videos found in", videos)

    for video in Videos:
        if destfolder is None:
            videofolder = str(Path(video).parents[0])
        else:
            videofolder = destfolder
        vname = os.path.splitext(os.path.basename(video))[0]

        try:
            df, dataname, _, _ = auxiliaryfunctions.load_analyzed_data(
                videofolder, vname, DLCscorer, track_method=track_method)
            nframes = len(df)
            startindex = max([int(np.floor(nframes * cfg["start"])), 0])
            stopindex = min([int(np.ceil(nframes * cfg["stop"])), nframes])
            Index = np.arange(stopindex - startindex) + startindex

            df = df.iloc[Index]
            mask = df.columns.get_level_values("bodyparts").isin(bodyparts)
            df_temp = df.loc[:, mask]
            Indices = []
            if outlieralgorithm == "uncertain":
                p = df_temp.xs("likelihood", level=-1, axis=1)
                ind = df_temp.index[(p < p_bound).any(axis=1)].tolist()
                Indices.extend(ind)
            elif outlieralgorithm == "jump":
                temp_dt = df_temp.diff(axis=0)**2
                temp_dt.drop("likelihood", axis=1, level=-1, inplace=True)
                sum_ = temp_dt.sum(axis=1, level=1)
                ind = df_temp.index[(sum_ > epsilon**2).any(axis=1)].tolist()
                Indices.extend(ind)
            elif outlieralgorithm == "fitting":
                d, o = compute_deviations(df_temp, dataname, p_bound, alpha,
                                          ARdegree, MAdegree)
                # Some heuristics for extracting frames based on distance:
                ind = np.flatnonzero(
                    d > epsilon
                )  # time points with at least average difference of epsilon
                if (
                        len(ind) < cfg["numframes2pick"] * 2
                        and len(d) > cfg["numframes2pick"] * 2
                ):  # if too few points qualify, extract the most distant ones.
                    ind = np.argsort(d)[::-1][:cfg["numframes2pick"] * 2]
                Indices.extend(ind)
            elif outlieralgorithm == "manual":
                wd = Path(config).resolve().parents[0]
                os.chdir(str(wd))
                from deeplabcut.refine_training_dataset import (
                    outlier_frame_extraction_toolbox, )

                outlier_frame_extraction_toolbox.show(
                    config,
                    video,
                    shuffle,
                    df_temp,
                    savelabeled,
                    cfg.get("multianimalproject", False),
                )

            # Run always except when the outlieralgorithm == manual.
            if not outlieralgorithm == "manual":
                Indices = np.sort(list(set(Indices)))  # remove repetitions.
                print(
                    "Method ",
                    outlieralgorithm,
                    " found ",
                    len(Indices),
                    " putative outlier frames.",
                )
                print(
                    "Do you want to proceed with extracting ",
                    cfg["numframes2pick"],
                    " of those?",
                )
                if outlieralgorithm == "uncertain" or outlieralgorithm == "jump":
                    print(
                        "If this list is very large, perhaps consider changing the parameters "
                        "(start, stop, p_bound, comparisonbodyparts) or use a different method."
                    )
                elif outlieralgorithm == "fitting":
                    print(
                        "If this list is very large, perhaps consider changing the parameters "
                        "(start, stop, epsilon, ARdegree, MAdegree, alpha, comparisonbodyparts) "
                        "or use a different method.")

                if not automatic:
                    askuser = input("yes/no")
                else:
                    askuser = "******"

                if (askuser == "y" or askuser == "yes" or askuser == "Ja"
                        or askuser == "ha"):  # multilanguage support :)
                    # Now extract from those Indices!
                    ExtractFramesbasedonPreselection(
                        Indices,
                        extractionalgorithm,
                        df_temp,
                        dataname,
                        video,
                        cfg,
                        config,
                        opencv,
                        cluster_resizewidth,
                        cluster_color,
                        savelabeled,
                    )
                else:
                    print(
                        "Nothing extracted, please change the parameters and start again..."
                    )
        except FileNotFoundError as e:
            print(e)
            print(
                "It seems the video has not been analyzed yet, or the video is not found! "
                "You can only refine the labels after the a video is analyzed. Please run 'analyze_video' first. "
                "Or, please double check your video file path")