Ejemplo n.º 1
0
def tasks():
    steps = ["intro", "one_time", "repeats", "repeats_failed", "group_names", "uuid", "futures", "priority"]
    docs = Storage()
    comments = Storage()
    docs.intro = """
#### Intro
So, here we are trying to learn (and test) web2py's scheduler.

Actually you have to download latest trunk scheduler to make it work (backup current gluon/scheduler.py and replace with the one on trunk).

This app ships with a default SQLite database, feel free to test on your preferred db engine.

All examples code should work if you just prepend
``
import datetime
from gluon.contrib.simplejson import loads, dumps
sr = db.scheduler_run
sw = db.scheduler_worker
st = db.scheduler_task
``:python

DRY!

Additionally, every example uses ``task_name``, but that is not a required parameter.
It just helps **this app** to verify that all is working correctly when you press the **Verify** button.

We have 3 functions defined into models/scheduler.py (don't get confused). It should be something like this:
``
# coding: utf8
import time
from gluon.scheduler import Scheduler

def demo1(*args,**vars):
    print 'you passed args=%s and vars=%s' % (args, vars)
    return args[0]

def demo2():
    1/0

def demo3():
    time.sleep(15)
    print 1/0
    return None

def demo4():
    time.sleep(15)
    print "I'm printing something"
    return dict(a=1, b=2)

def demo5():
    time.sleep(15)
    print "I'm printing something"
    rtn = dict(a=1, b=2)


scheduler = Scheduler(db)
##or, alternatively :
#scheduler = Scheduler(db,
#                      dict(
#                        demo1=demo1,
#                        demo2=demo2,
#                        demo3=demo3,
#                        demo4=demo4,
#                        foo=demo5
#                        )
#                      )
``:python

So, we have:
 -  demo1 : standard function, with some printed output, returning the first arg
 -  demo2 : never returns, throws exception
 -  demo3 : sleeps for 15 seconds, tries to print something, throws exception
 -  demo4 : sleeps for 15 seconds, print something, returns a dictionary
 -  demo5 : sleeps for 15 seconds, print nothing, doesn't return anything

The scheduler istantiated with the db only. Optionally, you can pass a dictionary
containing a mapping between strings and functions.
In the latter case, all functions are "assigned" to a string that is the function name,
except for function demo5 that we "assigned" to 'foo'.

All interactions with scheduler is done acting on the scheduler_* tables.
    """

    docs.one_time = """
#### One time only

Okay, let's start with the tests....something simple: a function that needs to run one time only.

``
st.insert(task_name='one_time_only', function_name='demo4')
``:python

Instructions:
 - Push "Clear All"
 - Push "Start Monitoring"
 - If not yet, start a worker in another shell ``web2py.py -K w2p_scheduler_tests``
 - Wait a few seconds, a worker shows up
 - Push "Queue Task"
 - Wait a few seconds

What you should see:
 - one worker is **ACTIVE**
 - one scheduler_task gets **QUEUED**, goes into **RUNNING** for a while and then becomes **COMPLETED**
 - when the task is **RUNNING**, a scheduler_run record pops up (**RUNNING**)
 - When the task is **COMPLETED**, the scheduler_run record is updated to show a **COMPLETED** status.

Than, click "Stop Monitoring" and "Verify"
    """
    comments.one_time = """
So, we got a task executed by scheduler, yeeeeahh!

Please note that you can get a lot of data to inspect execution in this mode
###### scheduler_task
 - start_time is when you queued the task
 - task_name is useful for retrieving the results later ``db(sr.scheduler_task.id == st.id)(st.task_name == 'one_time_only')(st.status == 'COMPLETED').select(sr.result, sr.output)``:python
 - task gets a ``uuid`` by default
###### scheduler_run
 - ``result`` is in json format
 - ``output`` is the stdout, so you can watch your nice "print" statements
 - ``start_time`` is when the task started
 - ``stop_time`` is when the task stopped
 - ``worker_name`` gets the worker name that processed the task
    """
    docs.repeats = """
#### Repeating task

Let's say we want to run the demo1 function with some args and vars, 2 times.
``
st.insert(task_name="repeats", function_name='demo1', args=dumps(['a','b']), vars=dumps(dict(c=1, d=2)), repeats=2, period=30)
``

Instructions (same as before):
 - Push "Clear All"
 - Push "Start Monitoring"
 - If not yet, start a worker in another shell ``web2py.py -K w2p_scheduler_tests``
 - Wait a few seconds, a worker shows up
 - Push "Queue Task"
 - Wait a few seconds


Verify that:
 - one worker is **ACTIVE**
 - one scheduler_task gets **QUEUED**, goes into **RUNNING** for a while
 - a scheduler_run record is created, goes **COMPLETED**
 - task gets **QUEUED** again for a second round
 - a new scheduler_run record is created
 - task becomes **COMPLETED**
 - a second scheduler_run record is created

Than, click "Stop Monitoring".
    """
    comments.repeats = """
So, we got a task executed twice automatically, yeeeeahh!

###### scheduler_task
 - times_run is 2
 - last_run_time is when the second execution started
###### scheduler_run
 - output args and vars got printed ok.
 - start_time of the second execution is after ``period*seconds`` after start_time of the first_run
    """

    docs.repeats_failed = """
#### Repeats Failed
We want to run a function once, but allowing the function to raise an exception once.
That is, you want the function to "retry" an attempt if the first one fails.
Remember, repeats_failed==1 will let a task fail only once, that is the default behaviour.
If you want the task to repeat once AFTER it is failed, you need to specify repeats_failed=2.
We'll enqueue demo2, that we know if will fail in bot runs, just to check if everything
works as expected (i.e. it gets re-queued only one time after the first FAILED run)

``
st.insert(task_name='repeats_failed', function_name='demo2', repeats_failed=2, period=30)
``
    """
    docs.expire = """
#### Expired status
To better understand the use of ``stop_time`` parameter we're going to schedule
a function with stop_time < now. Task will have the status **QUEUED**, but as soon
as a worker see it, it will set its status to **EXPIRED**.
``
stop_time = request.now - datetime.timedelta(seconds=60)
st.insert(task_name='expire', function_name='demo4', stop_time=stop_time)
``
    """
    docs.priority = """
#### Priority
Also if there is no explicit priority management for tasks you'd like to execute
a task putting that "on top of the list", for one-time-only tasks you can force the
``next_run_time`` parameter to something very far in the past (according to your preferences).
A task gets **ASSIGNED** to a worker, and the worker picks up (and execute) first tasks with
minimum ``next_run_time``.

``
next_run_time = request.now - datetime.timedelta(seconds=60)
st.insert(task_name='priority1', function_name='demo1', args=dumps(['scheduled_first']))
st.insert(task_name='priority2', function_name='demo1', args=dumps(['scheduled_second']), next_run_time=next_run_time)
``
    """
    docs.returns_null = """
#### Tasks with no return value
Sometimes you want a function to run, but you're not interested in the return value
(because you save it in another table, or you simply don't mind the results).
Well, there is no reason to have a record into the scheduler_run table!
So, by default, if a function doesn't return anything, its scheduler_run record
will be automatically deleted.
The record gets created anyway while the task is **RUNNING** because it's a way to
tell if a function is taking some time to be "executed", and because if task fails
(timeouts or exceptions) the record is needed to see what went wrong.
We'll queue 2 functions, both with no return values, demo3 that generates an exception
``
st.insert(task_name='no_returns1', function_name='demo5')
st.insert(task_name='no_returns2', function_name='demo3')
``
    """

    return dict(docs=docs, comments=comments)
Ejemplo n.º 2
0
def tasks():
    steps = [
        'intro',
        'one_time', 'repeats', 'repeats_failed', 'prevent_drift',
        'group_names', 'uuid', 'futures', 'priority',
        'immediate'
        ]
    docs = Storage()
    comments = Storage()
    docs.intro = """
#### Intro
So, here we are trying to learn (and test) web2py's scheduler. This app always
documents the latest scheduler available in web2py, so make sure downloaded
the latest version in "master" repo scheduler
to make it work (backup current gluon/scheduler.py and replace with the one
on "master", just to be safe).

This app ships with a default SQLite database, feel free to test on your preferred db engine.

All examples code should work if you just prepend
``
import datetime
from gluon.contrib.simplejson import loads, dumps
sr = db.scheduler_run
sw = db.scheduler_worker
st = db.scheduler_task
``:python

DRY!

Additionally, every example uses ``task_name``, but that is not a required parameter.
It just helps **this app** to verify that all is working correctly when you press the **Verify** button.

We have 7 functions defined into models/scheduler.py (don't get confused). It should be something like this:
``
# coding: utf8
import time
from gluon.scheduler import Scheduler

def demo1(*args,**vars):
    print 'you passed args=%s and vars=%s' % (args, vars)
    return args[0]

def demo2():
    1/0

def demo3():
    time.sleep(15)
    print 1/0
    return None

def demo4():
    time.sleep(15)
    print "I'm printing something"
    return dict(a=1, b=2)

def demo5():
    time.sleep(15)
    print "I'm printing something"
    rtn = dict(a=1, b=2)

def demo6():
    time.sleep(5)
    print '50%'
    time.sleep(5)
    print '!clear!100%'
    return 1

import random
def demo7():
    time.sleep(random.randint(1,15))
    print W2P_TASK, request.now
    return W2P_TASK.id, W2P_TASK.uuid

scheduler = Scheduler(db)
##or, alternatively :
#scheduler = Scheduler(db,
#                      dict(
#                        demo1=demo1,
#                        demo2=demo2,
#                        demo3=demo3,
#                        demo4=demo4,
#                        foo=demo5
#                        )
#                      )
``:python

So, we have:
 -  demo1 : standard function, with some printed output, returning the first arg
 -  demo2 : never returns, throws exception
 -  demo3 : sleeps for 15 seconds, tries to print something, throws exception
 -  demo4 : sleeps for 15 seconds, print something, returns a dictionary
 -  demo5 : sleeps for 15 seconds, print nothing, doesn't return anything
 -  demo6 : sleeps for 5 seconds, print something, sleeps 5 seconds, print something, return 1
 -  demo7 : sleeps for a random time, prints and returns W2P_TASK.id and W2P_TASK.uuid

The scheduler is istantiated with the db only. Optionally, you can pass a dictionary
containing a mapping between strings and functions.
In the latter case, all functions are "assigned" to a string that is the function name,
except for function demo5 that we "assigned" to 'foo'.

All interactions with scheduler are done using the public API.

#### Exposed Api

##### Queue Task
- ``scheduler.queue_task(function, pargs=[], pvars={}, **kwargs)`` : accepts a lot of arguments, to make your life easier....
 -- ``function`` : required. This can be a string as ``'demo2'`` or directly the function, i.e. you can use ``scheduler.queue_task(demo2)``
 -- ``pargs`` : p stands for "positional". pargs will accept your args as a list, without the need to jsonify them first.
  ---  ``scheduler.queue_task(demo1, [1,2])``
  ... does the exact same thing as
  ... ``st.validate_and_insert(function_name = 'demo1', args=dumps([1,2]))``
  ... and in a lot less characters
  ... **NB**: if you do ``scheduler.queue_task(demo1, [1,2], args=dumps([2,3]))`` , ``args`` will prevail and the task will be queued with **2,3**
 -- ``pvars`` : as with ``pargs``, will accept your vars as a dict, without the need to jsonify them first.
  --- ``scheduler.queue_task(demo1, [], {'a': 1, 'b' : 2})`` or ``scheduler.queue_task(demo1, pvars={'a': 1, 'b' : 2})``
  ... does the exact same thing as
  ... ``st.validate_and_insert(function_name = 'demo1', vars=dumps({'a': 1, 'b' : 2}))``
  ... **NB**:  if you do ``scheduler.queue_task(demo1, None, {'a': 1, 'b': 2}, vars=dumps({'a': 2, 'b' : 3}))`` , ``vars`` will prevail and the task will be queued with **{'a': 2, 'b' : 3}**
 -- ``kwargs`` : all other scheduler_task columns can be passed as keywords arguments, e.g. :
  ... ``
       scheduler.queue_task(
          demo1, [1,2], {a: 1, b : 2},
          repeats = 0,
          period = 180,
       ....
      )``:python
 -- since version 2.4.1 if you pass an additional parameter ``immediate=True`` it will force the main worker to reassign tasks. Until 2.4.1, the worker checks for new tasks every 5 cycles (so, ``5*heartbeats`` seconds). If you had an app that needed to check frequently for new tasks, to get a "snappy" behaviour you were forced to lower the ``heartbeat`` parameter, putting the db under pressure for no reason. With ``immediate=True`` you can force the check for new tasks: it will happen at most as ``heartbeat`` seconds are passed
The method returns the result of validate_and_insert, with the ``uuid`` of the task you queued (can be the one you passed or the auto-generated one).

``<Row {'errors': {}, 'id': 1, 'uuid': '08e6433a-cf07-4cea-a4cb-01f16ae5f414'}>``

If there are errors (e.g. you used ``period = 'a'``), you'll get the result of the validation, and id and uuid will be None

``<Row {'errors': {'period': 'enter an integer greater than or equal to 0'}, 'id': None, 'uuid': None}>``
##### Task status
- ``task_status(self, ref, output=False)``
 -- ``ref`` can be either:
  --- an integer --> lookup will be done by scheduler_task.id
  --- a string --> lookup will be done by scheduler_task.uuid
  --- a query --> lookup as you wish (as in db.scheduler_task.task_name == 'test1')
  --- **NB**: in the case of a query, only the last scheduler_run record will be fetched
  --- output=True will include the scheduler_run record too, plus a ``result`` key holding the decoded result
"""

    docs.one_time = """
#### One time only

Okay, let's start with the tests....something simple: a function that needs to run one time only.

``
scheduler.queue_task(demo4, task_name='one_time_only')
``:python

Instructions:
 - Push "Clear All"
 - Push "Start Monitoring"
 - If not yet, start a worker in another shell ``web2py.py -K w2p_scheduler_tests``
 - Wait a few seconds, a worker shows up
 - Push "Queue Task"
 - Wait a few seconds

What you should see:
 - one worker is **ACTIVE**
 - one scheduler_task gets **QUEUED**, goes into **RUNNING** for a while and then becomes **COMPLETED**
 - when the task is **RUNNING**, a scheduler_run record pops up (**RUNNING**)
 - When the task is **COMPLETED**, the scheduler_run record is updated to show a **COMPLETED** status.

Than, click "Stop Monitoring" and "Verify"
    """
    comments.one_time = """
So, we got a task executed by scheduler, yeeeeahh!

Please note that you can get a lot of data to inspect execution in this mode
###### scheduler_task
 - start_time is when you queued the task
 - task_name is useful for retrieving the results later ``db(sr.scheduler_task.id == st.id)(st.task_name == 'one_time_only')(st.status == 'COMPLETED').select(sr.result, sr.output)``:python
 - task gets a ``uuid`` by default
###### scheduler_run
 - ``run_result`` is in json format
 - ``run_output`` is the stdout, so you can watch your nice "print" statements
 - ``start_time`` is when the task started
 - ``stop_time`` is when the task stopped
 - ``worker_name`` gets the worker name that processed the task
    """
    docs.repeats = """
#### Repeating task

Let's say we want to run the demo1 function with some args and vars, 2 times.
``
scheduler.queue_task(demo1, ['a','b'], dict(c=1, d=2), task_name="repeats", repeats=2, period=10)
``

Instructions (same as before):
 - Push "Clear All"
 - Push "Start Monitoring"
 - If not yet, start a worker in another shell ``web2py.py -K w2p_scheduler_tests``
 - Wait a few seconds, a worker shows up
 - Push "Queue Task"
 - Wait a few seconds


Verify that:
 - one worker is **ACTIVE**
 - one scheduler_task gets **QUEUED**, goes into **RUNNING** for a while
 - a scheduler_run record is created, goes **COMPLETED**
 - task gets **QUEUED** again for a second round
 - a new scheduler_run record is created
 - task becomes **COMPLETED**
 - a second scheduler_run record is created

Than, click "Stop Monitoring".
    """
    comments.repeats = """
So, we got a task executed twice automatically, yeeeeahh!

###### scheduler_task
 - times_run is 2
 - last_run_time is when the second execution started
###### scheduler_run
 - output args and vars got printed ok.
 - start_time of the second execution is after ``period*seconds`` after start_time of the first_run
    """

    docs.repeats_failed = """
#### Retry Failed

We want to run a function once, but allowing the function to raise an exception once.
That is, you want the function to "retry" an attempt if the first one fails.
We'll enqueue demo2, that we know if will fail in both runs, just to check if everything
works as expected (i.e. it gets re-queued only one time after the first FAILED run)

``
scheduler.queue_task(demo2, task_name='retry_failed', retry_failed=1, period=10)
``
    """
    docs.repeats_failed_consecutive = """
#### Retry Failed, round #2: understanding "consecutive failures"

As explained before, demo8 will fail the first two times, and then completes.
As soon as a repeating task completes, the "failed counter" gets resetted.
This means that if we queue demo8 with two allowed failures and 2 repeats,
we'll see 6 executions, also if the total failures will sum up to 4.
This is useful for tasks that are dependant on external resources that may
be occasionally unavailable, and but still want to have the task to be executed if not too
many failures occurred consecutively.

``
scheduler.queue_task(demo8, task_name='retry_failed_consecutive', retry_failed=2, repeats=2, period=10)
``
    """
    docs.expire = """
#### Expired status

To better understand the use of ``stop_time`` parameter we're going to schedule
a function with stop_time < now. Task will have the status **QUEUED**, but as soon
as a worker see it, it will set its status to **EXPIRED**.
``
stop_time = request.now - datetime.timedelta(seconds=60)
scheduler.queue_task(demo4, task_name='expire', stop_time=stop_time)
``
    """
    docs.priority = """
#### Priority

Also if there is no explicit priority management for tasks you'd like to execute
a task putting that "on top of the list", for one-time-only tasks you can force the
``next_run_time`` parameter to something very far in the past (according to your preferences).
A task gets **ASSIGNED** to a worker, and the worker picks up (and execute) first tasks with
minimum ``next_run_time`` in the set.

``
next_run_time = request.now - datetime.timedelta(seconds=60)
scheduler.queue_task(demo1, ['scheduled_first'], task_name='priority1')
scheduler.queue_task(demo1, ['scheduled_second'], task_name='priority2', next_run_time=next_run_time)
``
    """
    docs.returns_null = """
#### Tasks with no return value

Sometimes you want a function to run, but you're not interested in the return value
(because you save it in another table, or you simply don't mind the results).
Well, there is no reason to have a record into the scheduler_run table!
So, by default, if a function doesn't return anything, its scheduler_run record
will be automatically deleted.
The record gets created anyway while the task is **RUNNING** because it's a way to
tell if a function is taking some time to be "executed", and because if task fails
(timeouts or exceptions) the record is needed to see what went wrong.
We'll queue 2 functions, both with no return values, demo3 that generates an exception
``
scheduler.queue_task(demo5, task_name='no_returns1')
scheduler.queue_task(demo3, task_name='no_returns2')
``
    """
    docs.timeouts = """
#### Tasks that will be terminated because they run too long

By default, the ``timeout`` parameter is set to 60 seconds. This means that if a
task takes more than 60 seconds to return, it is terminated and its status becomes
**TIMEOUT**.

We'll queue the function demo4 with a ``timeout`` parameter of 5 seconds and then again the
same function without timeout.
``
scheduler.queue_task(demo4, task_name='timeouts1', timeout=5)
scheduler.queue_task(demo4, task_name='timeouts2')
``


    """
    docs.percentages = """
#### Reporting percentages

A special "word" encountered in the print statements of your functions clear all
the previous output. That word is ``!clear!``.
This, coupled with the ``sync_output`` parameter, allows to report percentages
a breeze. The function ``demo6`` sleeps for 5 seconds, outputs ``50%``.
Then, it sleeps other 5 seconds and outputs ``100%``. Note that the output in the
scheduler_run table is synced every 2 seconds and that the second print statement
that contains ``!clear!100%`` gets the ``50%`` output cleared and replaced
by ``100%`` only.

``
scheduler.queue_task(demo6, task_name='percentages', sync_output=2)
``
"""
    docs.immediate = """
#### The ``immediate`` parameter

You may have found out that n seconds pass between the moment you queue the task and the moment it gets picked up by the worker.
That's because the scheduler - to alleviate the pressure on the db - checks for new tasks every 5 cycles.

This means that tasks gets **ASSIGNED** every 5*heartbeat seconds. Heartbeat by default is 3 seconds, so if you're very unlucky, you may wait **AT MOST** 15 seconds
between queueing the task and the task being picked up.

This lead developers to lower the heartbeat parameter to get a smaller "timeframe" of inactivity, putting the db under pressure for no reason.

Since 2.4.1, this is not needed anymore. You can force the scheduler to inspect the scheduler_task table for new tasks and set them to **ASSIGNED**.
This will happen without waiting 5 cycles: if your worker is not busy processing already queued tasks, it will wait **AT MOST** 3 seconds.
This happens:
 - if you set the worker status with ``is_ticker=True`` to **PICK**
 - if you pass to the ``queue_task`` function the ``immediate=True`` parameter (will set the worker status to **PICK** automatically)

------
Watch out: if you need to queue a lot of tasks, just do it and use the ``immediate=True`` parameter on the last: no need to update the worker_status table multiple times ^_^
------
``
scheduler.queue_task(demo1, ['a','b'], dict(c=1, d=2), task_name="immediate_task", immediate=True)
``

Instructions (same as before):
 - Push "Clear All"
 - Push "Start Monitoring"
 - If not yet, start a worker in another shell ``web2py.py -K w2p_scheduler_tests``
 - Wait a few seconds, a worker shows up
 - Push "Queue Task"
 - Wait a few seconds


Verify that:
 - tasks gets **ASSIGNED** within 3 seconds
 - one scheduler_task gets **QUEUED**, goes into **RUNNING** for a while
 - a scheduler_run record is created, goes **COMPLETED**
 - task becomes **COMPLETED**

Than, click "Stop Monitoring".
"""
    docs.w2p_task = """
#### W2P_TASK

From 2.4.1, scheduler injects a W2P_TASK global variable holding the ``id`` and the ``uuid`` of the task being executed.
You can use it to coordinate tasks, or to do some specialized logging.

``
scheduler.queue_task(demo7, task_name='task_variable')
``
"""

    docs.prevent_drift = """
#### The ``prevent_drift`` parameter

The scheduler replaced a lot of ``cron`` facilities around, but the difference are somehow subtle in some cases.
With a repeating task, by default the scheduler assumes that you want at least ``period`` seconds to pass
between one execution and the following one. You need to remember that the scheduler picks up the tasks as soon as possible:
this means that a task with a specific ``start_time`` will be picked up only if ``start_time`` is in the past.
Usually a few seconds pass by, and this is not an issue, but if you're scheduling a task to start at 9:00AM every morning,
you'll see the start_times of the executions slowly ''drifting'' from 9:00AM to 9:01AM.
This is because if the scheduler is busy processing tasks the first time at 9:00AM, your task will be picked up at 9:01AM,
and the next execution the scheduler will pick up the task only if 9:01AM are passed.
From 2.8.3, you can pass a ``prevent_drift`` parameter that will enable the scheduler to calculate the times between execution
more like cron, i.e. on the ``start_time`` instead of the actual execution of the task.
Beware! This means that the paradigm where the scheduler lets pass at least ``period`` seconds is invalidated.

``
scheduler.queue_task(demo1, ['a','b'], dict(c=1, d=2), task_name="prevent_drift", repeats=2, period=10, prevent_drift=True)
``

"""

    docs.stop_task = """
#### The ``stop_task`` method

If you need to stop a runaway task, the only way is to tell the worker to terminate the running process. Asking users
to retrieve the current worker processing the task and update its status to ``STOP_TASK`` was "Too Much",
so there's an experimental API for it.

NB: If the task is RUNNING it will terminate it, meaning that status will be set as FAILED. If the task is QUEUED,
its stop_time will be set as to "now", the enabled flag will be set to False, and the status to STOPPED

``stop_task`` either takes an integer or a string: if it's an integer it's assumed to be the id, if it's a string it's assumed to
be an uuid

So, given that the runaway task has been queued with
``
scheduler.queue_task(demo6, task_name='stop_task', uuid='stop_task')
``
you can stop it with
``
scheduler.stop_task('stop_task')
``

Instructions (same as before):
 - Push "Clear All"
 - Push "Start Monitoring"
 - If not yet, start a worker in another shell ``web2py.py -K w2p_scheduler_tests``
 - Wait a few seconds, a worker shows up
 - Push "Queue Task"
 - Wait a few seconds
 - Push "Stop Task"


Verify that:
 - task gets **STOPPED** or **FAILED**, according to it was running or not

Than, click "Stop Monitoring".
"""

    return dict(docs=docs, comments=comments)