Beispiel #1
0
class BackpropagationAlgorithm(IterativeAlgorithm):
    r"""Backpropagation algorithm for multilayer perceptrons.

    INPUT:

    - ``self`` -- object on which the function is invoked.

    - ``sample`` -- list or tuple of ``LabeledExample`` containing the sample
      to be learnt.

    - ``dimensions`` -- list or tuple of numerical values describing the number
      of layers and the number of units therein, including the input layer.

    - ``threshold`` -- boolean (default: ``True``) flag setting the use of
      thresholded perceptrons.

    - ``activations`` -- ActivationFunction or list/tuple of ActivationFunction
      (default: SigmoidActivationFunction()) activation functions to be used
      for the perceptron units. When a single value is specified, it applies to
      all units. When a list/tuple is specified, each element corresponds to
      all units in a perceptron layer.

    - ``weight_bound`` -- float (default: 0.1) upper bound of the interval in
      which the initial weights and thresholds are chosen uniformly at random
      (the lower bound of this interval is ``-1 * weight_bound``). A
      ``ValueError`` is thrown if this parameter is not positive.


    OUTPUT:

    ``BackpropagationAlgorithm`` object.

    EXAMPLES:

    Consider the following data set summarizing the binary XOR function:

    ::

        >>> from yaplf.data import LabeledExample
        >>> xor_sample = [LabeledExample((0, 0), (0,)),
        ... LabeledExample((0, 1), (1,)), LabeledExample((1, 0), (1,)),
        ... LabeledExample((1, 1), (0,))]

    This is a paradigmatical example of non-linearly separable data set which
    needs a richer model than a perceptron in order to be learnt:

    ::

        >>> from yaplf.utility.activation import SigmoidActivationFunction
        >>> from yaplf.algorithms.neural.multilayer import BackpropagationAlgorithm
        >>> alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1),
        ... threshold = True, activations = SigmoidActivationFunction(10))

    In order to actually run the algorithm it is necessary to specify a
    stopping criterion (the default behaviour would only execute a learning
    iteration, probably not going so far). In order to keep it simple, one can
    chose ``FixedIterationsStoppingCriterion`` so as to run a fixed number of
    iterations, say `5000`:

    ::

        >>> from yaplf.utility.stopping import FixedIterationsStoppingCriterion
        >>> alg.run(stopping_criterion = \
        ... FixedIterationsStoppingCriterion(5000), learning_rate = .1)

    The inferred model can be inspected through the ``model`` field in the
    ``LearningAlgorithm`` object:

    ::

        >>> alg.model # random
        MultilayerPerceptron((2, 2, 1), [array([[-0.49748418, -0.48592928],
        [-0.72052151, -0.69609958]]), array([[ 0.78258238, -0.84286672]])],
        thresholds = [array([ 0.71869556,  0.24337534]), array([-0.36203957])],
        activations = SigmoidActivationFunction(10))

    Note that the algorithm can be run in different flavours, described in the
    documentation of ``run``.

    One of the ways of assessing the performance of the algorithm is that of
    invoking the ``test`` function inherited from the ``Model`` class in order
    to see how each example has been classified:

    ::

        >>> alg.model.test(xor_sample, verbose = True) # random
        (0, 0) mapped to 0.0279358676426, label is (0,), error [ 0.00078041]
        (0, 1) mapped to 0.968320708961, label is (1,), error [ 0.00100358]
        (1, 0) mapped to 0.966511649371, label is (1,), error [ 0.00112147]
        (1, 1) mapped to 0.0429967333857, label is (0,), error [ 0.00184872]
        MSE 0.00118854472284
        0.0011885447228408164

    The named argument ``verbose`` activates the verbose output detailing how
    error spreads on each example. Note that the output of ``test`` is likely
    to be different on each run, as when the class constructor is called the
    initial weights are chosen at random. It is also possible that the
    performance is not satisfactory for some examples are not learnt at all.
    This can be an effect of many factors. For instance:

    - learning has not converged, and more iterations are needed; thus other
      ``run`` invocations should be performed, suitably chosing the function
      parameters;

    - despite the chosen multilayer perceptron architecture can learn the data
      set, learning converged to an unsatisfactory model because of the initial
      values randomly picked; thus learning should be restarted, either
      invoking ``reset`` on the ``BackpropagationAlgorithm`` object or creating
      a new instance of this class;

    - the chosen architecture cannot learn the data set, thus the whole process
      should be repeated modifying the initially chosen multilayer perceptron
      architecture, for instance chosing a different number of layers or a
      different number of units in layers.

    Another way of evaluating the inferred model is in this case that of
    graphically visualizing it. As the perceptron has two inputs it is indeed
    possible to call its ``plot`` method specifying a region containing all
    the pattern supplied to the learning algorithm:

    ::

        >>> alg.model.plot((0, 1), (0, 1), shading = True)

    Learning can be monitored using the class ``ErrorTrajectory`` in package
    ``yaplf.graph.trajectory`` as follows:

   ::

        >>> alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1),
        ... activations = SigmoidActivationFunction(10))
        >>> errObs = ErrorTrajectory(alg)
        >>> alg.run(stopping_criterion = \
        ... FixedIterationsStoppingCriterion(1500))
        >>> errObs.get_trajectory(color='red', joined = True)

    In this way, at each learning iteration the inferred model is tested
    against the training set, so that the ``get_trajectory`` function returns
    a graph of the related error versus the iteration number.

    As a final remark, it should be highlighted that these examples are shown
    for illustrative purpose. The suitable way of assessing a learnt model
    performance involves more complex techniques involving for instance the use
    of:

    - a test set in order to assess when learning should stop, using a
      different stopping criterion such as ``TestSetStoppingCriterion``;

    - a test set in order to graphically inspect how error changes as learning
      proceeds, specifying different arguments to the constructor of
      ``ErrorTrajectory``;

    - a cross validation procedure (see function ``cross_validation`` in
      package ``yaplf.utility.validation``) in order to choose the best
      perceptron architecture.

    Concerning last point, the cross validation procedure can select among a
    given set of choices the one minimizing the inferred error. More precisely,
    consider the next code snippet:

    ::

        >>> tc_sample = (LabeledExample((0.9, 0.9, 0.9, 0.9, 0.1, 0.1, 0.9,
        ... 0.9, 0.9), (0.1,)), LabeledExample((0.9, 0.9, 0.9,  0.1, 0.9, 0.1,
        ...  0.1, 0.9, 0.1), (0.9,)), LabeledExample((0.9, 0.9, 0.9, 0.9, 0.1,
        ... 0.9, 0.9, 0.1, 0.9), (0.1,)), LabeledExample((0.1, 0.1, 0.9, 0.9,
        ... 0.9, 0.9, 0.1, 0.1, 0.9), (0.9,)), LabeledExample((0.9, 0.9, 0.9,
        ... 0.1, 0.1, 0.9, 0.9, 0.9, 0.9), (0.1,)), LabeledExample((0.1, 0.9,
        ... 0.1, 0.1, 0.9, 0.1, 0.9, 0.9, 0.9), (0.9,)), LabeledExample((0.9,
        ... 0.1, 0.9, 0.9, 0.1, 0.9, 0.9, 0.9, 0.9), (0.1,)),
        ... LabeledExample((0.9, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.1, 0.1),
        ... (0.9,)))
        >>> from yaplf.utility.validation import cross_validation
        >>> p = cross_validation(BackpropagationAlgorithm, tc_sample,
        ... {'activations': (SigmoidActivationFunction(2),
        ... SigmoidActivationFunction(3), SigmoidActivationFunction(5),
        ... SigmoidActivationFunction(10))},
        ... fixed_parameters = {'dimensions': (9, 2, 1)},
        ... run_parameters = {'stopping_criterion': \
        ... FixedIterationsStoppingCriterion(1000), 'learning_rate': 1},
        ... num_folds = 4, verbose = True)
        Errors: [0.030319163781913711, 0.060571552036313515,
        0.029535618302241346, 0.4099711142423762]
        Minimum error in position 2
        Selected parameters (SigmoidActivationFunction(5),)

    Its effect is that of training a multilayer perceptron through the
    backpropagation algorithm starting by the sample in ``tc_sample``, as
    specified by the first two arguments and analyzing four different
    activation functions (third argument, note that in this case it consists
    of a singleton tuple and more generally can be a tuple involving different
    named argument of ``BackpropagationAlgorithm`` or other learning
    algorithms.) For each choice of the activation function, learning evolves
    as follows:

    - the sample is partitioned in 4 subsets, as specified by the ``num_folds``
      named argument;

    - the learning algorithm is instantiated excluding from the sample the
      first subset in previous point, and using as named argument those
      specified in ``fixed_parameters``, joined with the one specifying the
      selected activation function; subsequently, the algorithm is run using
      the named arguments in ``run_parameters``;

    - the inferred multilayer perceptron is tested on the originally excluded
      subset of examples;

    - the whole process is repeated starting from the second subset, and so on
      till the fourth; the test errors are then averaged and associated to the
      initially chosen activation function.

    Finally, the activation function yielding the minimum averaged test error
    is selected and used in order to infer a new multilayer perceptron starting
    from all examples.

    AUTHORS:

    - Dario Malchiodi (2010-03-31)

    """

    def __init__(self, sample, dimensions=None, **kwargs):
        r"""
        See ``BackpropagationAlgorithm`` for full documentation.

        """

        IterativeAlgorithm.__init__(self, sample)

        # dimensions needs to be a named argument in order to be able to
        # cross-validate on it. The default value will correspond to three
        # layers, with the input and output one automatically sized in order
        # to fit the provided data set, and the hidden one containing half of
        # the biggest value between number of input and output units.

        if dimensions is not None:
            self.dimensions = dimensions
        else:
            n_in = len(sample[0].pattern)
            n_out = len(sample[0].label)
            self.dimensions = (n_in, int(max(n_in, n_out) / 2), n_out)

        num_input = self.dimensions[0]
        for example in sample:
            if len(example.pattern) != num_input:
                raise ValueError('Sample incompatible with number of units')

        try:
            self.activations = kwargs['activations']
            if len(self.activations) != len(dimensions):
                raise ValueError(\
                    'Activations incompatible with number of layers')
        except KeyError:
            self.activations = SigmoidActivationFunction()
        except TypeError:
            pass
            # Raised by len if the argument is assigned a single activation
        # this calms down pylint
        self.threshold = None

        self.reset(**kwargs)

    def reset(self, **kwargs):
        r"""
        Reset weights and thresholds of the inferred MultilayerPerceptron
        picking values at random.

        INPUT:

        - ``self`` -- object on which the function is invoked.

        - ``threshold`` -- boolean (default: ``True``) flag setting the use of
          thresholded perceptrons.

        - ``weight_bound`` -- float (default: 0.1) upper bound of the interval
          in which the initial weights and thresholds are chosen uniformly at
          random (the lower bound of this interval is ``-1 * weight_bound``). A
          ``ValueError`` is thrown if this parameter is not positive.

        OUTPUT:

        No output. After the invocation the initialized model is available
        through the ``model`` field, in form of a ``MultilayerPerceptron``
        instance.

        EXAMPLES:

        Consider the following data set summarizing the binary XOR function:

        ::

            >>> from yaplf.data import LabeledExample
            >>> xor_sample = [LabeledExample((0, 0), (0,)),
            ... LabeledExample((0, 1), (1,)), LabeledExample((1, 0), (1,)),
            ... LabeledExample((1, 1), (0,))]

        This is a paradigmatical example of non-linearly separable data set
        which needs a richer model than a perceptron in order to be learnt:

        ::

            >>> from yaplf.utility.validation import SigmoidActivationFunction
            >>> from yaplf.algorithms.neural import BackpropagationAlgorithm
            >>> alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1),
            ... threshold = True, activations = SigmoidActivationFunction(10))

        Suppose the algorithm is run for, say, 1000 iterations with learning
        rate set to `0.1` with the following results:

    ::

            >>> from yaplf.utility.stopping import \
            ... FixedIterationsStoppingCriterion
            >>> alg.run(stopping_criterion = \
            ... FixedIterationsStoppingCriterion(1000), learning_rate = .1)
            >>> alg.model.test(xor_sample, verbose = True) # random
            (0.100000000000000, 0.100000000000000) mapped to 0.107281883481,
            label is (0.100000000000000,), error [  5.30258270e-05]
            (0.100000000000000, 0.900000000000000) mapped to 0.889216991161,
            label is (0.900000000000000,), error [ 0.00011627]
            (0.900000000000000, 0.100000000000000) mapped to 0.350484814876,
            label is (0.900000000000000,), error [ 0.30196694]
            (0.900000000000000, 0.900000000000000) mapped to 0.353978869018,
            label is (0.100000000000000,), error [ 0.06450527]
            MSE 0.0916603759241
            0.0916603759241

    It is clear that the algorithm has learnt only three examples out of four.
    In order to check whether or not the algorithm converged, one can continue
    its execution for, say, another thousand iterations:

    ::

        >>> alg.run(stopping_criterion = \
        ... FixedIterationsStoppingCriterion(1000))
        >>> alg.model.test(xor_sample) # random
        0.0916130351364

    As the test error is essentially unchanged we can conclude that learning
    converged to a local minima of the error function. In order to fresh start
    another session, hoping that the random initialization can overcome this
    problem, one can get a new instance of ``BackpropagationAlgorithm``, or
    call ``reset`` on the alrerady available instance:

    ::

        >>> alg.reset()
        >>> alg.run(stopping_criterion =\
        ... FixedIterationsStoppingCriterion(1000))
        >>> alg.model.test(xor_sample) # random
        1.01130579975e-05

        AUTHORS:

        - Dario Malchiodi (2010-03-31)

        """

        try:
            self.threshold = kwargs['threshold']
        except KeyError:
            self.threshold = True

        try:
            # picks initial weights and threshold uniformly
            # between -weight_bound and weight_bound
            init_bound = kwargs['weight_bound']
            if init_bound <= 0:
                raise ValueError(
                    'The weight_bound parameter should be positive')
        except KeyError:
            init_bound = 0.1

        dims = transpose((self.dimensions[1:], self.dimensions[:-1]))
        connections = [random.uniform(-1 * init_bound, init_bound, shape)
            for shape in dims]

        perc_kwargs = {'activations': self.activations}
        if self.threshold:
            thr = [random.uniform(-1 * init_bound, init_bound, shape) \
                for shape in self.dimensions[1:]]
            perc_kwargs['thresholds'] = thr

        self.model = MultilayerPerceptron(self.dimensions, connections,
             **perc_kwargs)

    def run(self, **kwargs):
        r"""
        Run the learning algorithm.

        INPUT:

        - ``self`` -- object on which the function is invoked.

        - ``stopping_criterion`` -- ``StoppingCriterion`` instance (default:
          ``FixedIterationsStoppingCriterion()``, amounting to the execution of
          one learning step) describing the criterion to be fulfilled in order
          to stop the training phase.

        - ``batch`` -- boolean (default: ``False``, amounting to online
          learning mode) flag setting batch learning, i.e. model update after
          the presentation of all examples, instead of online learning, i.e.
          model update at each example presentation.

        - ``selector`` -- iterator (default: ``sequential_selector``, amounting
          to cycling through the available examples) selecting the next sample
          to be fed to the learnin algorithm.

        - ``learning_rate`` -- float (default: 0.1) value to be used as
          learning rate.

        - ``momentum_term`` -- float (default: 0) value tu be used as momentum
          term.

        - ``min_error`` -- float (default: 0, which means connections
          and thresholds will always be updated) error value under which no
          update will occur on connections and thresholds.

        OUTPUT:

        No output. After the invocation the inferred model is available through
        the ``model`` field, in form of a ``Perceptron`` instance.

        EXAMPLES:

        Consider the following data set summarizing the binary XOR function,
        and a ``BackpropagationAlgorithm`` instance for it:

        ::

            >>> from yaplf.data import LabeledExample
            >>> xor_sample = [LabeledExample((0, 0), (0,)),
            ... LabeledExample((0, 1), (1,)), LabeledExample((1, 0), (1,)),
            ... LabeledExample((1, 1), (0,))]
            >>> from yaplf.utility import SigmoidActivationFunction
            >>> from yaplf.algorithms.neural import BackpropagationAlgorithm
            >>> alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1),
            ... threshold = True, activations = SigmoidActivationFunction(10))

        In order to actually run the algorithm it is necessary to specify a
        stopping criterion (the default behaviour would only execute a learning
        iteration, probably not going so far). In order to keep it simple, one
        can chose ``FixedIterationsStoppingCriterion`` so as to run a fixed
        number of iterations, say `5000`:

        ::

            >>> from yaplf.utility.stopping import \
            ... FixedIterationsStoppingCriterion
            >>> alg.run(stopping_criterion = \
            ... FixedIterationsStoppingCriterion(5000), learning_rate = .1)

        The inferred model can be inspected through the ``model`` field in the
        ``LearningAlgorithm`` object:

        ::

            >>> alg.model # random
            MultilayerPerceptron((2, 2, 1), [array([[-0.49748418, -0.48592928],
            [-0.72052151, -0.69609958]]), array([[ 0.78258238, -0.84286672]])],
            thresholds = [array([ 0.71869556,  0.24337534]),
            array([-0.36203957])], activations = SigmoidActivationFunction(10))

        One of the ways of assessing the performance of the algorithm is that
        of invoking the ``test`` function inherited from the ``Model`` class in
        order to see how each example has been classified:

        ::

            >>> alg.model.test(xor_sample, verbose = True) # random
            (0, 0) mapped to 0.0279358676426, label is (0,), error
            [ 0.00078041]
            (0, 1) mapped to 0.968320708961, label is (1,), error [ 0.00100358]
            (1, 0) mapped to 0.966511649371, label is (1,), error [ 0.00112147]
            (1, 1) mapped to 0.0429967333857, label is (0,), error
            [ 0.00184872]
            MSE 0.00118854472284
            0.0011885447228408164

        The named argument ``verbose`` activates the verbose output detailing
        how error spreads on each example. Note that the output of ``test`` is
        likely to be different on each run, as when the class constructor is
        called the initial weights are chosen at random.

        Another solution is that of running the algorithm until the error on
        its training set is below a given threshold. This can be easily
        attained using ``TrainErrorStoppingCriterion``:

        ::

            >>> from yaplf.utility.stopping import TrainErrorStoppingCriterion
            >>> alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1),
            ... learning_rate = .1,
            ... activations = SigmoidActivationFunction(10))
            >>> alg.run(stopping_criterion = TrainErrorStoppingCriterion(0.01))

        The argument in the constructor of ``TrainErrorStoppingCriterion`` sets
        the above mentioned threshold. It is also possible (and, besides, more
        correct) to run the learning algorithm using a training set and
        stopping the process when the test error on another data set goes
        below a given threshold. This can be attained using
        ``TestErrorStoppingCriterion`` in package ``yaplf.utility.stopping``.

        Another way of evaluating the inferred model is in this case that of
        graphically visualizing it. As the perceptron has two inputs it is
        indeed possible to call its ``plot`` method specifying a region
        containing all the pattern supplied to the learning algorithm:

        ::

            >>> alg.model.plot((0, 1), (0, 1), shading = True)

        Learning can be also monitored using the class ``ErrorTrajectory`` in
        package ``yaplf.graph.trajectory`` as follows:

       ::

            >>> alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1),
            ... activations = SigmoidActivationFunction(10))
            >>> errObs = ErrorTrajectory(alg)
            >>> alg.run(stopping_criterion = \
            ... FixedIterationsStoppingCriterion(1500))
            >>> errObs.get_trajectory(color='red', joined = True)

        In this way, at each learning iteration the inferred model is tested
        against the training set, so that the ``get_trajectory`` function
        returns a graph of the related error versus the iteration number.

        The ``run`` function has a number of named argument allowing to
        tune how the backpropagation algorithm selects its output, and whose
        meaning requires a bit more information about how the algorithm works:
        basically, during the initialization of ``BackpropagationAlgorithm``
        and at each invokation of ``reset`` a multilayer perceptron is created
        picking its connection weights and its threshold values at random;
        subsequently at each iteration an example is selected and fed to this
        perceptron. The obtained output is compared to the expected one and
        an error is computed. This error, together with other information,
        the computation of a quantity `\Delta w` which will in turn be used
        in order to modify connections and thresholds. More precisely at a
        given time `t`, for each connection weight, say `w_{ij}(t)`, a
        corresponding `\Delta w_{ij}(t)` is computed and the perceptron is
        updated so that `w_{ij}(t+1) = w_{ij}(t) - \eta \Delta w_{ij}(t) +
        \alpha \Delta w_{ij}(t-1)`. This rule implements a local descent in
        the error space so that eventually the obtained minimizes locally the
        training error. The values `\eta` and `\alpha` can be chosen through
        the following named arguments:

        - ``learning_rate`` corresponds to`\eta`, the so-called *learning
          rate*, with a default value of 0.1. The higher this value, the more
          extended will be the local steps of the algorithm. This will improve
          convergence but will also raise the risk of outrunning an optimum
          and starting to oscillate around it.

        - ``momentum_term`` corresponds to `\alpha`, the so-called *momentum
          term* as it features a momentum which increases the actual step
          when the surface is smooth and tends to decrement it otherwise. Its
          default value is 0, corresponding to the original version of the
          backpropagation algorithm.

        The following named argument also affect the learning behaviour:

        - ``selector`` sets an iterator selecting the next example to be fed to
          the learnin algorithm. Its default value cycles through the provided
          sample.

        - ``batch`` -- boolean flag setting batch learning mode, in which the
          update values `\Delta w_{ij}` are cumulated for all example in the
          sample and subsequently used in order to modify the perceptron,
          rather than the standard online mode where the perceptron is updated
          after each single example presentation. Its default value is
          ``False``, corresponding to the online mode previously illustrated.
          It is worth noting that when the algorithm is run for a fixed number
          of iterations (i.e. through ``FixedIterationsStoppingCriterion``),
          an iteration corresponds to one example presentation for online mode
          and to the presentation of the whole sample for batch mode.

        - ``min_error`` -- error value under which no update will occur on
          connections and thresholds. This argument can be used in order to
          avoid that some examples are overlearnt at the expense of the
          remaining ones. The default value is 0, leading to updating the
          perceptron regardless of how small is the error on an example.

        AUTHORS:

        - Dario Malchiodi (2010-03-31)

        """

        IterativeAlgorithm.run(self, **kwargs)
        try:
            # batch or online learning
            batch = kwargs['batch']
        except KeyError:
            batch = False

        try:
            learning_rate = kwargs['learning_rate']
        except KeyError:
            learning_rate = 0.1

        try:
            momentum_term = kwargs['momentum_term']
        except KeyError:
            momentum_term = 0

        try:
            min_error = kwargs['min_error']
        except KeyError:
            min_error = 0

        dims = transpose((self.dimensions[1:], self.dimensions[:-1]))
        last_delta = [zeros(shape) for shape in dims]
        if self.threshold:
            last_delta_threshold = [zeros(shape) \
                for shape in self.dimensions[1:]]
        while self.stop_criterion.stop() == False:
            if batch:
                cumul_delta = last_delta = [zeros(shape) for shape in dims]
                if self.threshold:
                    cumul_delta_thresholds = [zeros(shape) \
                        for shape in self.dimensions[1:]]
                for elem in self.sample:
                    answer = self.model.compute(elem.pattern,
                        full_state=True, show_net=True, no_unbox=True)

                    delta = [[]] * (len(self.dimensions) - 1)

                    error = transpose(answer[-1])[0] - elem.label
                    if abs(error) < min_error:
                        continue
                    derivative = [\
                        self.model.get_activation(-1).compute_derivative(net,
                        func_value=val)
                        for (val, net)  in answer[-1]]

                    delta[-1] = error * derivative

                    for lev in range(1, len(self.dimensions) - 1):
                        derivatives = [self.model.get_activation(-lev - \
                            1).compute_derivative(net, func_value=val) \
                            for (val, net)  in answer[-lev - 1]]

                        prop_delta = \
                            dot(transpose(self.model.connections[-lev]),
                            delta[-lev])
                        delta[-lev - 1] = array(derivatives * prop_delta)

                    for lev in range(1, len(self.dimensions)):
                        new_delta = -learning_rate * outer(delta[-lev],
                           transpose(answer[-lev - 1])[0]) + \
                           momentum_term * last_delta[-lev]
                        cumul_delta[-lev] += new_delta
                        last_delta[-lev] = new_delta

                        if self.threshold:
                            new_delta = -learning_rate * delta[-lev] +\
                                momentum_term * last_delta_threshold[-lev]
                            cumul_delta_thresholds[-lev] += new_delta
                            last_delta_threshold[-lev] = new_delta

                for lev in range(1, len(self.dimensions)):
                    self.model.connections[-lev] += cumul_delta[-lev]

                    if self.threshold:
                        self.model.thresholds[-lev] += \
                            cumul_delta_thresholds[lev]

            else:
                elem = self.sample_selector.next()
                answer = self.model.compute(elem.pattern,
                    full_state=True, show_net=True, no_unbox=True)

                delta = [[]] * (len(self.dimensions) - 1)

                error = transpose(answer[-1])[0] - elem.label
                if abs(error) < min_error:
                    continue
                derivative = [\
                    self.model.get_activation(-1).compute_derivative(net,
                    func_value=val)
                    for (val, net)  in answer[-1]]

                delta[-1] = error * derivative

                for lev in range(1, len(self.dimensions) - 1):
                    derivatives = [self.model.get_activation(-lev - \
                        1).compute_derivative(net, func_value=val) \
                        for (val, net)  in answer[-lev - 1]]

                    prop_delta = dot(transpose(self.model.connections[-lev]),
                        delta[-lev])
                    delta[-lev - 1] = array(derivatives * prop_delta)

                for lev in range(1, len(self.dimensions)):
                    new_delta = -learning_rate * outer(delta[-lev],
                        transpose(answer[-lev - 1])[0]) + \
                        momentum_term * last_delta[-lev]
                    self.model.connections[-lev] += new_delta
                    last_delta[-lev] = new_delta

                    if self.threshold:
                        new_delta = -learning_rate * delta[-lev] +\
                            momentum_term * last_delta_threshold[-lev]
                        self.model.thresholds[-lev] += new_delta
                        last_delta_threshold[-lev] = new_delta

            self.notify_observers()
Beispiel #2
0
    def reset(self, **kwargs):
        r"""
        Reset weights and thresholds of the inferred MultilayerPerceptron
        picking values at random.

        INPUT:

        - ``self`` -- object on which the function is invoked.

        - ``threshold`` -- boolean (default: ``True``) flag setting the use of
          thresholded perceptrons.

        - ``weight_bound`` -- float (default: 0.1) upper bound of the interval
          in which the initial weights and thresholds are chosen uniformly at
          random (the lower bound of this interval is ``-1 * weight_bound``). A
          ``ValueError`` is thrown if this parameter is not positive.

        OUTPUT:

        No output. After the invocation the initialized model is available
        through the ``model`` field, in form of a ``MultilayerPerceptron``
        instance.

        EXAMPLES:

        Consider the following data set summarizing the binary XOR function:

        ::

            >>> from yaplf.data import LabeledExample
            >>> xor_sample = [LabeledExample((0, 0), (0,)),
            ... LabeledExample((0, 1), (1,)), LabeledExample((1, 0), (1,)),
            ... LabeledExample((1, 1), (0,))]

        This is a paradigmatical example of non-linearly separable data set
        which needs a richer model than a perceptron in order to be learnt:

        ::

            >>> from yaplf.utility.validation import SigmoidActivationFunction
            >>> from yaplf.algorithms.neural import BackpropagationAlgorithm
            >>> alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1),
            ... threshold = True, activations = SigmoidActivationFunction(10))

        Suppose the algorithm is run for, say, 1000 iterations with learning
        rate set to `0.1` with the following results:

    ::

            >>> from yaplf.utility.stopping import \
            ... FixedIterationsStoppingCriterion
            >>> alg.run(stopping_criterion = \
            ... FixedIterationsStoppingCriterion(1000), learning_rate = .1)
            >>> alg.model.test(xor_sample, verbose = True) # random
            (0.100000000000000, 0.100000000000000) mapped to 0.107281883481,
            label is (0.100000000000000,), error [  5.30258270e-05]
            (0.100000000000000, 0.900000000000000) mapped to 0.889216991161,
            label is (0.900000000000000,), error [ 0.00011627]
            (0.900000000000000, 0.100000000000000) mapped to 0.350484814876,
            label is (0.900000000000000,), error [ 0.30196694]
            (0.900000000000000, 0.900000000000000) mapped to 0.353978869018,
            label is (0.100000000000000,), error [ 0.06450527]
            MSE 0.0916603759241
            0.0916603759241

    It is clear that the algorithm has learnt only three examples out of four.
    In order to check whether or not the algorithm converged, one can continue
    its execution for, say, another thousand iterations:

    ::

        >>> alg.run(stopping_criterion = \
        ... FixedIterationsStoppingCriterion(1000))
        >>> alg.model.test(xor_sample) # random
        0.0916130351364

    As the test error is essentially unchanged we can conclude that learning
    converged to a local minima of the error function. In order to fresh start
    another session, hoping that the random initialization can overcome this
    problem, one can get a new instance of ``BackpropagationAlgorithm``, or
    call ``reset`` on the alrerady available instance:

    ::

        >>> alg.reset()
        >>> alg.run(stopping_criterion =\
        ... FixedIterationsStoppingCriterion(1000))
        >>> alg.model.test(xor_sample) # random
        1.01130579975e-05

        AUTHORS:

        - Dario Malchiodi (2010-03-31)

        """

        try:
            self.threshold = kwargs['threshold']
        except KeyError:
            self.threshold = True

        try:
            # picks initial weights and threshold uniformly
            # between -weight_bound and weight_bound
            init_bound = kwargs['weight_bound']
            if init_bound <= 0:
                raise ValueError(
                    'The weight_bound parameter should be positive')
        except KeyError:
            init_bound = 0.1

        dims = transpose((self.dimensions[1:], self.dimensions[:-1]))
        connections = [random.uniform(-1 * init_bound, init_bound, shape)
            for shape in dims]

        perc_kwargs = {'activations': self.activations}
        if self.threshold:
            thr = [random.uniform(-1 * init_bound, init_bound, shape) \
                for shape in self.dimensions[1:]]
            perc_kwargs['thresholds'] = thr

        self.model = MultilayerPerceptron(self.dimensions, connections,
             **perc_kwargs)