Example #1
0
def clusterizer(profile, **kwargs):
    """Clusters each resource to an appropriate cluster in order to be postprocessable
    by regression analysis.

    \b
      * **Limitations**: `none`
      * **Dependencies**: `none`

    Clusterizer tries to find a suitable cluster for each resource in the profile. The clusters
    are either computed w.r.t the sort order of the resource amounts, or are computed according
    to the sliding window.

    The sliding window can be further adjusted by setting its **width** (i.e. how many near values
    on the x axis will we fit to a cluster) and its **height** (i.e. how big of an interval of
    resource amounts will be consider for one cluster). Both **width** and **height** can be further
    augmented. **Width** can either be `absolute`, where we take in maximum the absolute number of
    resources, `relative`, where we take in maximum the percentage of number of resources for each
    cluster, or `weighted`, where we take the number of resource depending on the frequency of their
    occurrences. Similarly, the **height** can either be `absolute`, where we set the interval of
    amounts to an absolute size, or `relative`, where we set the interval of amounts relative to the
    to the first resource amount in the cluster (so e.g. if we have window of height 0.1 and the
    first resource in the cluster has amount of 100, we will cluster every resources in interval 100
    to 110 to this cluster).

    For more details about regression analysis refer to :ref:`postprocessors-clusterizer`.
    """
    runner.run_postprocessor_on_profile(profile, 'clusterizer', kwargs)
Example #2
0
def simple_moving_average(ctx, **kwargs):
    """ **Simple Moving Average**

        In the most of cases, it is an unweighted Moving Average, this means that the each x-coordinate
        in the data set (profiled resources) has equal importance and is weighted equally. Then the `mean`
        is computed from the previous `n data` (`<no-center>`), where the `n` marks `<window-width>`.
        However, in science and engineering the mean is normally taken from an equal number of data on
        either side of a central value (`<center>`). This ensures that variations in the mean are aligned
        with the variations in the mean are aligned with variations in the data rather than being shifted
        in the x-axis direction. Since the window at the boundaries of the interval does not contain enough
        count of points usually, it is necessary to specify the value of `<min-periods>` to avoid the NaN
        result. The role of the weighted function in this approach belongs to `<window-type>`, which represents
        the suite of the following window functions for filtering:

            - **boxcar**: known as rectangular or Dirichlet window, is equivalent to no window at all: --
            - **triang**: standard triangular window: /\
            - **blackman**: formed by using three terms of a summation of cosines, minimal leakage, close to optimal
            - **hamming**: formed by using a raised cosine with non-zero endpoints, minimize the nearest side lobe
            - **bartlett**: similar to triangular, endpoints are at zero, processing of tapering data sets
            - **parzen**: can be regarded as a generalization of k-nearest neighbor techniques
            - **bohman**: convolution of two half-duration cosine lobes
            - **blackmanharris**: minimum in the sense that its maximum side lobes are minimized (symmetric 4-term)
            - **nuttall**: minimum 4-term Blackman-Harris window according to Nuttall (so called 'Nuttall4c')
            - **barthann**: has a main lobe at the origin and asymptotically decaying side lobes on both sides
            - **kaiser**: formed by using a Bessel function, needs beta value (set to 14 - good starting point)

            .. _SciPyWindow: https://docs.scipy.org/doc/scipy/reference/signal.windows.html#module-scipy.signal.windows

            For more details about this window functions or for their visual view you can see SciPyWindow_.
    """
    kwargs.update({'moving_method': 'sma'})
    kwargs.update(ctx.parent.params)
    runner.run_postprocessor_on_profile(ctx.obj, 'moving_average', kwargs)
Example #3
0
def exponential_moving_average(ctx, **kwargs):
    """ **Exponential Moving Average**

        This method is a type of moving average methods, also know as **Exponential** Weighted Moving Average,
        that places a greater weight and significance on the most recent data points. The weighting for each
        far x-coordinate decreases exponentially and never reaching zero. This approach of moving average reacts
        more significantly to recent changes than a *Simple* Moving Average, which applies an equal weight to
        all observations in the period. To calculate an EMA must be first computing the **Simple** Moving Average
        (SMA) over a particular sub-interval. In the next step must be calculated the multiplier for smoothing
        (weighting) the EMA, which depends on the selected formula, the following options are supported (`<decay>`):

            - **com**: specify decay in terms of center of mass: :math:`{\\alpha}` = 1 / (1 + com), for com >= 0
            - **span**: specify decay in terms of span: :math:`{\\alpha}` = 2 / (span + 1), for span >= 1
            - **halflife**: specify decay in terms of half-life, :math:`{\\alpha}` = 1 - exp(log(0.5) / halflife), for halflife > 0
            - **alpha**: specify smoothing factor :math:`{\\alpha}` directly: 0 < :math:`{\\alpha}` <= 1

        The computed coefficient :math:`{\\alpha}` represents the degree of weighting decrease, a constant smoothing
        factor, The higher value of :math:`{\\alpha}` discounts older observations faster, the small value to the
        contrary. Finally, to calculate the current value of EMA is used the relevant formula. It is important
        do not confuse **Exponential** Moving Average with **Simple** Moving Average. An **Exponential** Moving
        Average behaves quite differently from the second mentioned method, because it is the function of
        weighting factor or length of the average.
    """
    kwargs.update({
        'moving_method': 'ema',
        'window_width': kwargs['decay'][1],
        'decay': kwargs['decay'][0]
    })
    kwargs.update(ctx.parent.params)
    runner.run_postprocessor_on_profile(ctx.obj, 'moving_average', kwargs)
Example #4
0
def normalizer(profile):
    """Normalizes performance profile into flat interval.

    \b
      * **Limitations**: `none`
      * **Dependencies**: `none`

    Normalizer is a postprocessor, which iterates through all of the snapshots
    and normalizes the resources of same type to interval ``(0, 1)``, where
    ``1`` corresponds to the maximal value of the given type.

    Consider the following list of resources for one snapshot generated by
    :ref:`collectors-time`:

    .. code-block:: json

        \b
        [
            {
                'amount': 0.59,
                'uid': 'sys'
            }, {
                'amount': 0.32,
                'uid': 'user'
            }, {
                'amount': 2.32,
                'uid': 'real'
            }
        ]

    Normalizer yields the following set of resources:

    .. code-block:: json

        \b
        [
            {
                'amount': 0.2543103448275862,
                'uid': 'sys'
            }, {
                'amount': 0.13793103448275865,
                'uid': 'user'
            }, {
                'amount': 1.0,
                'uid': 'real'
            }
        ]

    Refer to :ref:`postprocessors-normalizer` for more thorough description and
    examples of `normalizer` postprocessor.
    """
    runner.run_postprocessor_on_profile(profile, 'normalizer', {})
Example #5
0
def performance_profile_postprocess(repo_path, profile_realpath, specification, registered):
    """Function for postprocessing the given profile with a postprocessor given by specification
    :param str repo_path: path to repository
    :param str profile_realpath: path to profile
    :param dict specification: dictionary containging postprocessor specification
    :param bool registered: registration status of the profile
    :return: 200 OK if successfull, 404 NOT FOUND otherwise
    """
    original_path = os.getcwd()
    os.chdir(repo_path)

    # if (registered):
    #     raw = False
    # else:
    #     raw = True

    try:
        perf_profile = profile.load_profile_from_file(profile_realpath, is_raw_profile=(not registered))
        
        name = ''
        arguments = ''

        if (specification['name'].lower() == 'regression analysis'):
            name = 'regression_analysis'
            arguments = {'method': '', 'regression_models': '', 'steps': '', 'of_key': '', 'per_key': ''}
            for param in specification['parameters']:
                if (param['param'] == 'method'):
                    arguments['method'] = param['options'][0]
                elif (param['param'] == 'regression models'):
                    arguments['regression_models'] = param['options']
                elif (param['param'] == 'steps'):
                    arguments['steps'] = param['options'][0]
                elif (param['param'] == 'of'):  
                    arguments['of_key'] = param['options'][0]
                elif (param['param'] == 'depending on'):
                    arguments['per_key'] = param['options'][0]
        
        elif (specification['name'].lower() == 'normalizer'):
            arguments = {}
            name = 'normalizer'
        elif (specification['name'].lower() == 'filter'):
            arguments = {}
            name = 'filter'
        
        runner.run_postprocessor_on_profile(perf_profile, name, arguments)
        os.chdir(original_path)
        return create_response('Profile postprocessed successfully', 200)

    except Exception as e:
        os.chdir(original_path)
        eprint(e)
        return create_response(e, 404)
Example #6
0
def simple_moving_median(ctx, **kwargs):
    """ **Simple Moving Median**

        The second representative of Simple Moving Average methods is the Simple Moving **Median**. For
        this method are applicable to the same rules like in the first described method, except for the
        option for choosing the window type, which do not make sense in this approach. The only difference
        between these two methods are the way of computation the values in the individual sub-intervals.
        Simple Moving **Median** is not based on the computation of average, but as the name suggests, it
        based on the **median**.
    """
    kwargs.update({'moving_method': 'smm'})
    kwargs.update(ctx.parent.params)
    runner.run_postprocessor_on_profile(ctx.obj, 'moving_average', kwargs)
Example #7
0
def regressogram(profile, **kwargs):
    """
    Execution of the interleaving of profiled resources by **regressogram** models.

    \b
      * **Limitations**: `none`
      * **Dependencies**: `none`

    Regressogram belongs to the simplest non-parametric methods and its properties are
    the following:

        **Regressogram**: can be described such as step function (i.e. constant function
        by parts). Regressogram uses the same basic idea as a histogram for density estimate.
        This idea is in dividing the set of values of the x-coordinates (`<per_key>`) into
        intervals and the estimate of the point in concrete interval takes the mean/median of the
        y-coordinates (`<of_resource_key>`), respectively of its value on this sub-interval.
        We currently use the `coefficient of determination` (:math:`R^2`) to measure the fitness of
        regressogram. The fitness of estimation of regressogram model depends primarily on the number
        of buckets into which the interval will be divided. The user can choose number of buckets
        manually (`<bucket_window>`) or use one of the following methods to estimate the optimal
        number of buckets (`<bucket_method>`):

            - **sqrt**: square root (of data size) estimator, used for its speed and simplicity
            - **rice**: does not take variability into account, only data size and commonly overestimates
            - **scott**: takes into account data variability and data size, less robust estimator
            - **stone**: based on leave-one-out cross validation estimate of the integrated squared error
            - **fd**: robust, takes into account data variability and data size, resilient to outliers
            - **sturges**: only accounts for data size, underestimates for large non-gaussian data
            - **doane**: generalization of Sturges' formula, works better with non-gaussian data
            - **auto**: max of the Sturges' and 'fd' estimators, provides good all around performance

        .. _SciPy: https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bucket_edges

        For more details about these methods to estimate the optimal number of buckets or to view the
        code of these methods, you can visit SciPy_.

    For more details about this approach of non-parametric analysis refer to :ref:`postprocessors-regressogram`.
    """
    runner.run_postprocessor_on_profile(profile, 'regressogram', kwargs)
Example #8
0
def exec_fast_check(baseline_profile, baseline_x_pts, abs_error):
    """The function executes the classification of performance change between two profiles with
    using regression analysis. The type of the best model from the regressed profile, which
    contains the value absolute error, computed from the best models of both profile, is returned
    such as the degree of the changes.

    :param dict baseline_profile: baseline against which we are checking the degradation
    :param np_array baseline_x_pts: the value absolute error computed from the linear models
        obtained from both profiles
    :param integer abs_error: values of the independent variables from both profiles
    :returns: string (classification of the change)
    """
    # creating the new profile
    std_err_profile = copy.deepcopy(baseline_profile)
    del std_err_profile['global']['models']

    # executing the regression analysis
    for i, (x, y) in enumerate(
            zip(np.nditer(baseline_x_pts), np.nditer(abs_error))):
        std_err_profile['global']['resources'][i]['structure-unit-size'] = x
        std_err_profile['global']['resources'][i]['amount'] = y

    # Fixme: Extract of and per key
    regression_analysis_params = {
        "regression_models": [],
        "steps": 3,
        "method": "full",
        "of_key": "amount",
        "per_key": "structure-unit-size"
    }
    _, std_err_profile = runner.run_postprocessor_on_profile(
        std_err_profile,
        'regression_analysis',
        regression_analysis_params,
        skip_store=True)

    return std_err_profile
Example #9
0
def regression_analysis(profile, **kwargs):
    """..."""
    runner.run_postprocessor_on_profile(profile, 'mypostprocessor', kwargs)
Example #10
0
def filter(profile):
    """Filtering of the resources according ot the given query"""
    runner.run_postprocessor_on_profile(profile, 'filter', {})
Example #11
0
def regression_analysis(profile, **kwargs):
    """Finds fitting regression models to estimate models of profiled resources.

    \b
      * **Limitations**: Currently limited to models of `amount` depending on
        `structural-unit-size`
      * **Dependencies**: :ref:`collectors-complexity`

    Regression analyzer tries to find a fitting model to estimate the `amount`
    of resources depending on `structural-unit-size`.

    The following strategies are currently available:

        1. **Full Computation** uses all of the data points to obtain the best
           fitting model for each type of model from the database (unless
           ``--regression_models``/``-r`` restrict the set of models)

        2. **Iterative Computation** uses a percentage of data points to obtain
           some preliminary models together with their errors or fitness. The
           most fitting model is then expanded, until it is fully computed or
           some other model becomes more fitting.

        3. **Full Computation with initial estimate** first uses some percent
           of data to estimate which model would be best fitting. Given model
           is then fully computed.

        4. **Interval Analysis** uses more finer set of intervals of data and
           estimates models for each interval providing more precise modeling
           of the profile.

        5. **Bisection Analysis** fully computes the models for full interval.
           Then it does a split of the interval and computes new models for
           them. If the best fitting models changed for sub intervals, then we
           continue with the splitting.

    Currently we support **linear**, **quadratic**, **power**, **logaritmic**
    and **constant** models and use the `coeficient of determination`
    (:math:`R^2`) to measure the fitness of model. The models are stored as
    follows:

    .. code-block:: json

        \b
        {
            "uid": "SLList_insert(SLList*, int)",
            "r_square": 0.0017560012128507133,
            "coeffs": [
                {
                    "value": 0.505375215875552,
                    "name": "b0"
                },
                {
                    "value": 9.935159839322705e-06,
                    "name": "b1"
                }
            ],
            "x_interval_start": 0,
            "x_interval_end": 11892,
            "model": "linear",
            "method": "full",
        }

    For more details about regression analysis refer to
    :ref:`postprocessors-regression-analysis`. For more details how to collect
    suitable resources refer to :ref:`collectors-complexity`.
    """
    runner.run_postprocessor_on_profile(profile, 'regression_analysis', kwargs)