Example #1
0
        this job demonstrate how to utilize variablized configuration
    """,
    )
    wrapper.add_plan(Job.INIT_JOB, Job.START, "hello Serious")
    wrapper.add_plan("hello Serious", Job.DONE, "hello Kidding")
    wrapper.add_plan("hello Kidding", Job.DONE, "Serious")
    wrapper.add_plan("Serious", Job.DONE, "Kidding")
    wrapper.add_plan("Kidding", Job.DONE, Job.LAST_JOB)

    """
    same as previous tutorial
    but we declare the output, 'msg_to_[name]',which represent the message to be kept.
    the callback are also modified.
    """
    # ==
    j_temp = JobNode(id="hello template", desc="say hello to someone")
    j_temp.need_input("msg", "hello! Mr.[name]")
    j_temp.need_output("msg_to_[name]")
    j_temp.set_callback(hello_job)

    """
    remember we mentioned in the tutorial_04 that all the inputs should be
    explicitly declared. same as output. actually, it's fine if you don't
    declare the outputs; the process will still be executed correctly.
    however, this strategy is trying to improve the readability of the code.
    a person just take your code may be not familiar with the flow. the declared
    outputs will be listed in the generated document and help the folk to catch
    the key concepts of the job.
    """

    """
Example #2
0
 Job.DELEGATEES['my_shell'] = Shell()
 '''
 '''
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to use delegatees, say DFS or Pig
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'hadoop delegatee')
 wrapper.add_plan('hadoop delegatee', Job.DONE, 'wrong command')
 wrapper.add_plan('wrong command', Job.DONE, Job.LAST_JOB)
 '''
 prepare the jobs
 '''
 j = JobNode(id='hadoop delegatee',
             desc='''
     cat some file on the dfs (to run this tutorial, you have to prepare
     your own data on the dfs)
 ''')
 j.set_callback(delegated_job)
 wrapper.add_sub_job(j)
 # ==
 j = JobNode(id='wrong command',
             desc='''
     execute some error command
 ''')
 j.set_callback(failed_delegated_job)
 wrapper.add_sub_job(j)
 '''
 run this tutorial on the Hadoop system
 '''
 # ==
Example #3
0
 '''
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to use configuration mechanism for input data
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'hello Serious')
 wrapper.add_plan('hello Serious', Job.DONE, 'hello Kidding')
 wrapper.add_plan('hello Kidding', Job.DONE, Job.LAST_JOB)
 '''
 first, we build a template/prototype job for the hello jobs and assign
 a key-value pair input. the input could be access in the callback
 by self.get_input(<key_of_the_input>). note that we bracket the name in the
 config value. it's a variablized config. we will explain it later.
 '''
 # ==
 j_temp = JobNode(id='template', desc='say hello to someone')
 j_temp.need_input('msg', 'hello! Mr.[name]')
 j_temp.set_callback(hello_job)
 '''
 instead of directly add the template job into wrapper
 '''
 # wrapper.add_sub_job(j_temp)
 '''
 we make two copies from the templates and give the correct id and description.
 then, we assign the name to each job. you may guess the result - the input of
 template job, "msg", would be "completed" by replacing the "[name]" with
 the actual value we assign to the each job.
 '''
 # ==
 j = deepcopy(j_temp)
 j.id = 'hello Serious'
Example #4
0
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to utilize variablized configuration
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'hello Serious')
 wrapper.add_plan('hello Serious', Job.DONE, 'hello Kidding')
 wrapper.add_plan('hello Kidding', Job.DONE, 'Serious')
 wrapper.add_plan('Serious', Job.DONE, 'Kidding')
 wrapper.add_plan('Kidding', Job.DONE, Job.LAST_JOB)
 '''
 same as previous tutorial
 but we declare the output, 'msg_to_[name]',which represent the message to be kept.
 the callback are also modified.
 '''
 # ==
 j_temp = JobNode(id='hello template', desc='say hello to someone')
 j_temp.need_input('msg', 'hello! Mr.[name]')
 j_temp.need_output('msg_to_[name]')
 j_temp.set_callback(hello_job)
 '''
 remember we mentioned in the tutorial_04 that all the inputs should be
 explicitly declared. same as output. actually, it's fine if you don't
 declare the outputs; the process will still be executed correctly.
 however, this strategy is trying to improve the readability of the code.
 a person just take your code may be not familiar with the flow. the declared
 outputs will be listed in the generated document and help the folk to catch
 the key concepts of the job.
 '''
 '''
 same as previous tutorial
 '''
Example #5
0
    '''
    wrapper = JobBlock('entry job', '''
        this job demonstrate how to use configuration mechanism for input data
    ''')
    wrapper.add_plan(Job.INIT_JOB, Job.START, 'hello Serious')
    wrapper.add_plan('hello Serious', Job.DONE, 'hello Kidding')
    wrapper.add_plan('hello Kidding', Job.DONE, Job.LAST_JOB)

    '''
    first, we build a template/prototype job for the hello jobs and assign
    a key-value pair input. the input could be access in the callback
    by self.get_input(<key_of_the_input>). note that we bracket the name in the
    config value. it's a variablized config. we will explain it later.
    '''
    # ==
    j_temp = JobNode(id='template', desc='say hello to someone')
    j_temp.need_input('msg', 'hello! Mr.[name]')
    j_temp.set_callback(hello_job)
    '''
    instead of directly add the template job into wrapper
    '''
    # wrapper.add_sub_job(j_temp)

    '''
    we make two copies from the templates and give the correct id and description.
    then, we assign the name to each job. you may guess the result - the input of
    template job, "msg", would be "completed" by replacing the "[name]" with
    the actual value we assign to the each job.
    '''
    # ==
    j = deepcopy(j_temp)
Example #6
0
    Job.DELEGATEES['my_shell'] = Shell()

    '''
    '''
    wrapper = JobBlock('entry job', '''
        this job demonstrate how to use delegatees, say DFS or Pig
    ''')
    wrapper.add_plan(Job.INIT_JOB, Job.START, 'hadoop delegatee')
    wrapper.add_plan('hadoop delegatee', Job.DONE, 'wrong command')
    wrapper.add_plan('wrong command', Job.DONE, Job.LAST_JOB)

    '''
    prepare the jobs
    '''
    j = JobNode(id='hadoop delegatee', desc='''
        cat some file on the dfs (to run this tutorial, you have to prepare
        your own data on the dfs)
    ''')
    j.set_callback(delegated_job)
    wrapper.add_sub_job(j)
    # ==
    j = JobNode(id='wrong command', desc='''
        execute some error command
    ''')
    j.set_callback(failed_delegated_job)
    wrapper.add_sub_job(j)


    '''
    run this tutorial on the Hadoop system
    '''
    # ==
Example #7
0
    wrapper.add_plan(Job.INIT_JOB, Job.START, "job0")
    wrapper.add_plan("job0", Job.DONE, "job1")
    wrapper.add_plan("job1", Job.DONE, Job.LAST_JOB)

    """
    now we start to plan the detail of each job.
    each job should have a id and a paragraph of desc(ription) which will be
    generated into document and you won't be bother to prepare any other document.
    this mechanism helps the code to be kept alive.
    the job we need here are some very simple job. let's say we wanna print
    something in each job, so we don't need to prepare any input. (we leave this
    to other tutorial codes.) so we assign a "callback" method, normal_job, to the
    job. now you could check the callbacks in the beginning of this code.
    """
    # ==
    j = JobNode(id="job0", desc="desc0")
    j.set_callback(normal_job)
    wrapper.add_sub_job(j)
    # ==
    j = JobNode(id="job1", desc="desc1")
    j.set_callback(normal_job)
    wrapper.add_sub_job(j)
    # ==

    """
    things are almost done.
    all we need to do is to trigger the execution!
    check the result to re-exame the flow of the process
    """
    # ==
    job_id, state = wrapper.execute()
Example #8
0
    '''
    # ==
    j = ParallelJobBlock(id='para block1', desc='para block1')
    j.add_papallel_plan('job0','job1')
    wrapper.add_sub_job(j)

    '''
    then, we define the inner JobNodes.
    this time, we want let the job print something and sleep for a while few times.
    because both parallel jobs will print messages, applying one buffer will
    result in a mass. therefore, the flow engine will prepare one buffer for
    each parallel job. at the end of the job, the parent job will dump the
    children buffers sequentially. (first done, first dump)
    '''
    # ==
    j_sub = JobNode(id='job0',desc='desc0')
    j_sub.set_callback(lazy_job)
    j.add_sub_job(j_sub)
    # ==
    j_sub = JobNode(id='job1',desc='desc1')
    j_sub.set_callback(lazy_job)
    j.add_sub_job(j_sub)
    # ==

    '''
    check the result to re-exame the flow of the process
    '''
    # ==
    job_id, state = wrapper.execute()
    #raw_input()
Example #9
0
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to use dry run mechanism
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'foo')
 wrapper.add_plan('foo', Job.DONE, 'block')
 wrapper.add_plan('block', Job.DONE, 'fob')
 wrapper.add_plan('fob', Job.DONE, Job.LAST_JOB)
 '''
 now, we enable a secret switch to tell the whole process in the dry run mode
 '''
 wrapper.set_dry_run(True)
 '''
 prepare job sea
 '''
 j = JobNode(id='foo', desc=''' foo ''')
 j.set_callback(foo_job)
 wrapper.add_sub_job(j)
 # ==
 j = JobBlock(id='block', desc=''' block ''')
 j.add_plan(Job.INIT_JOB, Job.START, 'bar')
 j.add_plan('bar', Job.DONE, 'foobar')
 j.add_plan('foobar', Job.DONE, Job.LAST_JOB)
 # --
 j_sub = JobNode(id='bar', desc=''' bar ''')
 j_sub.set_callback(foo_job)
 j.add_sub_job(j_sub)
 # --
 j_sub = JobNode(id='foobar', desc=''' foobar ''')
 j_sub.set_callback(foobar_job)
 j.add_sub_job(j_sub)
Example #10
0
    map(lambda key: Job.set_global(key, CFG[key]), configs_for_jobs.keys())

    wrapper = JobBlock(
        'entry job', '''
        this job demonstrate how to use config management module
    ''')
    wrapper.add_plan(Job.INIT_JOB, Job.START, 'foo')
    wrapper.add_plan('foo', Job.DONE, Job.LAST_JOB)
    '''
    we could get the configs we just set as global by giving the key without value
    or, we could put it into some other input

    here we also introduce another usage of output:
    in the tutorial_04, we set the key of output without value; that's a kind of
    declaration to exclaim 'we will put some value with that key as the output.
    (and the later jobs could access it as input)
    this time, we do give value to output key because we want the job output
    something to the path we expected.
    '''
    j = JobNode(id='foo', desc=''' foo ''')
    j.need_input('a very long path blah blah')
    j.need_input(
        'composite path',
        '[another very long path blah blah]/append_with_a_sub_directory')
    j.need_output('output_path', '[yet another very long path blah blah]')
    j.set_callback(foo_job)
    wrapper.add_sub_job(j)

    job_id, state = wrapper.execute()
    #raw_input()
Example #11
0
    map(lambda key: Job.set_global(key, CFG[key]), configs_for_jobs.keys())


    wrapper = JobBlock('entry job', '''
        this job demonstrate how to use config management module
    ''')
    wrapper.add_plan(Job.INIT_JOB, Job.START, 'foo')
    wrapper.add_plan('foo', Job.DONE, Job.LAST_JOB)

    '''
    we could get the configs we just set as global by giving the key without value
    or, we could put it into some other input

    here we also introduce another usage of output:
    in the tutorial_04, we set the key of output without value; that's a kind of
    declaration to exclaim 'we will put some value with that key as the output.
    (and the later jobs could access it as input)
    this time, we do give value to output key because we want the job output
    something to the path we expected.
    '''
    j = JobNode(id='foo', desc=''' foo ''')
    j.need_input('a very long path blah blah')
    j.need_input('composite path', '[another very long path blah blah]/append_with_a_sub_directory')
    j.need_output('output_path','[yet another very long path blah blah]')
    j.set_callback(foo_job)
    wrapper.add_sub_job(j)


    job_id, state = wrapper.execute()
    #raw_input()
Example #12
0
        this job demonstrate how to use dry run mechanism
    ''')
    wrapper.add_plan(Job.INIT_JOB, Job.START, 'foo')
    wrapper.add_plan('foo', Job.DONE, 'block')
    wrapper.add_plan('block', Job.DONE, 'fob')
    wrapper.add_plan('fob', Job.DONE, Job.LAST_JOB)

    '''
    now, we enable a secret switch to tell the whole process in the dry run mode
    '''
    wrapper.set_dry_run(True)

    '''
    prepare job sea
    '''
    j = JobNode(id='foo', desc=''' foo ''')
    j.set_callback(foo_job)
    wrapper.add_sub_job(j)
    # ==
    j = JobBlock(id='block', desc=''' block ''')
    j.add_plan(Job.INIT_JOB, Job.START, 'bar')
    j.add_plan('bar', Job.DONE, 'foobar')
    j.add_plan('foobar', Job.DONE, Job.LAST_JOB)
    # --
    j_sub = JobNode(id='bar', desc=''' bar ''')
    j_sub.set_callback(foo_job)
    j.add_sub_job(j_sub)
    # --
    j_sub = JobNode(id='foobar', desc=''' foobar ''')
    j_sub.set_callback(foobar_job)
    j.add_sub_job(j_sub)
Example #13
0
    '''
    first, with the top-down design strategy, we define JobBlock, which is like
    wrapper with its own plan.
    '''
    # ==
    j = JobBlock(id='block1', desc='block1')
    j.add_plan(Job.INIT_JOB, Job.START, 'job0')
    j.add_plan('job0', Job.DONE, 'job1')
    j.add_plan('job1', Job.DONE, Job.LAST_JOB)
    wrapper.add_sub_job(j)

    '''
    then, we define the inner JobNodes (same as previous tutorial)
    '''
    # ==
    j_sub = JobNode(id='job0',desc='desc0')
    j_sub.set_callback(normal_job)
    j.add_sub_job(j_sub)
    # ==
    j_sub = JobNode(id='job1',desc='desc1')
    j_sub.set_callback(normal_job)
    j.add_sub_job(j_sub)
    # ==

    '''
    BTW, here's a small tips.
    while designing a large flow, you may want to well-organize your code by
    putting the related things together.
    but sometimes, you can't assign the value you want right after the job is
    initiated because the value should be calculated/generated later.
    we provide the flexibility to delay the manipulation.