示例#1
0
 Job.DELEGATEES['my_shell'] = Shell()
 '''
 '''
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to use delegatees, say DFS or Pig
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'hadoop delegatee')
 wrapper.add_plan('hadoop delegatee', Job.DONE, 'wrong command')
 wrapper.add_plan('wrong command', Job.DONE, Job.LAST_JOB)
 '''
 prepare the jobs
 '''
 j = JobNode(id='hadoop delegatee',
             desc='''
     cat some file on the dfs (to run this tutorial, you have to prepare
     your own data on the dfs)
 ''')
 j.set_callback(delegated_job)
 wrapper.add_sub_job(j)
 # ==
 j = JobNode(id='wrong command',
             desc='''
     execute some error command
 ''')
 j.set_callback(failed_delegated_job)
 wrapper.add_sub_job(j)
 '''
 run this tutorial on the Hadoop system
 '''
 # ==
示例#2
0
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to utilize variablized configuration
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'hello Serious')
 wrapper.add_plan('hello Serious', Job.DONE, 'hello Kidding')
 wrapper.add_plan('hello Kidding', Job.DONE, 'Serious')
 wrapper.add_plan('Serious', Job.DONE, 'Kidding')
 wrapper.add_plan('Kidding', Job.DONE, Job.LAST_JOB)
 '''
 same as previous tutorial
 but we declare the output, 'msg_to_[name]',which represent the message to be kept.
 the callback are also modified.
 '''
 # ==
 j_temp = JobNode(id='hello template', desc='say hello to someone')
 j_temp.need_input('msg', 'hello! Mr.[name]')
 j_temp.need_output('msg_to_[name]')
 j_temp.set_callback(hello_job)
 '''
 remember we mentioned in the tutorial_04 that all the inputs should be
 explicitly declared. same as output. actually, it's fine if you don't
 declare the outputs; the process will still be executed correctly.
 however, this strategy is trying to improve the readability of the code.
 a person just take your code may be not familiar with the flow. the declared
 outputs will be listed in the generated document and help the folk to catch
 the key concepts of the job.
 '''
 '''
 same as previous tutorial
 '''
示例#3
0
 '''
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to use configuration mechanism for input data
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'hello Serious')
 wrapper.add_plan('hello Serious', Job.DONE, 'hello Kidding')
 wrapper.add_plan('hello Kidding', Job.DONE, Job.LAST_JOB)
 '''
 first, we build a template/prototype job for the hello jobs and assign
 a key-value pair input. the input could be access in the callback
 by self.get_input(<key_of_the_input>). note that we bracket the name in the
 config value. it's a variablized config. we will explain it later.
 '''
 # ==
 j_temp = JobNode(id='template', desc='say hello to someone')
 j_temp.need_input('msg', 'hello! Mr.[name]')
 j_temp.set_callback(hello_job)
 '''
 instead of directly add the template job into wrapper
 '''
 # wrapper.add_sub_job(j_temp)
 '''
 we make two copies from the templates and give the correct id and description.
 then, we assign the name to each job. you may guess the result - the input of
 template job, "msg", would be "completed" by replacing the "[name]" with
 the actual value we assign to the each job.
 '''
 # ==
 j = deepcopy(j_temp)
 j.id = 'hello Serious'
示例#4
0
 wrapper = JobBlock(
     'entry job', '''
     this job demonstrate how to use dry run mechanism
 ''')
 wrapper.add_plan(Job.INIT_JOB, Job.START, 'foo')
 wrapper.add_plan('foo', Job.DONE, 'block')
 wrapper.add_plan('block', Job.DONE, 'fob')
 wrapper.add_plan('fob', Job.DONE, Job.LAST_JOB)
 '''
 now, we enable a secret switch to tell the whole process in the dry run mode
 '''
 wrapper.set_dry_run(True)
 '''
 prepare job sea
 '''
 j = JobNode(id='foo', desc=''' foo ''')
 j.set_callback(foo_job)
 wrapper.add_sub_job(j)
 # ==
 j = JobBlock(id='block', desc=''' block ''')
 j.add_plan(Job.INIT_JOB, Job.START, 'bar')
 j.add_plan('bar', Job.DONE, 'foobar')
 j.add_plan('foobar', Job.DONE, Job.LAST_JOB)
 # --
 j_sub = JobNode(id='bar', desc=''' bar ''')
 j_sub.set_callback(foo_job)
 j.add_sub_job(j_sub)
 # --
 j_sub = JobNode(id='foobar', desc=''' foobar ''')
 j_sub.set_callback(foobar_job)
 j.add_sub_job(j_sub)
示例#5
0
    map(lambda key: Job.set_global(key, CFG[key]), configs_for_jobs.keys())

    wrapper = JobBlock(
        'entry job', '''
        this job demonstrate how to use config management module
    ''')
    wrapper.add_plan(Job.INIT_JOB, Job.START, 'foo')
    wrapper.add_plan('foo', Job.DONE, Job.LAST_JOB)
    '''
    we could get the configs we just set as global by giving the key without value
    or, we could put it into some other input

    here we also introduce another usage of output:
    in the tutorial_04, we set the key of output without value; that's a kind of
    declaration to exclaim 'we will put some value with that key as the output.
    (and the later jobs could access it as input)
    this time, we do give value to output key because we want the job output
    something to the path we expected.
    '''
    j = JobNode(id='foo', desc=''' foo ''')
    j.need_input('a very long path blah blah')
    j.need_input(
        'composite path',
        '[another very long path blah blah]/append_with_a_sub_directory')
    j.need_output('output_path', '[yet another very long path blah blah]')
    j.set_callback(foo_job)
    wrapper.add_sub_job(j)

    job_id, state = wrapper.execute()
    #raw_input()
示例#6
0
    '''
    first, with the top-down design strategy, we define JobBlock, which is like
    wrapper with its own plan.
    '''
    # ==
    j = JobBlock(id='block1', desc='block1')
    j.add_plan(Job.INIT_JOB, Job.START, 'job0')
    j.add_plan('job0', Job.DONE, 'job1')
    j.add_plan('job1', Job.DONE, Job.LAST_JOB)
    wrapper.add_sub_job(j)

    '''
    then, we define the inner JobNodes (same as previous tutorial)
    '''
    # ==
    j_sub = JobNode(id='job0',desc='desc0')
    j_sub.set_callback(normal_job)
    j.add_sub_job(j_sub)
    # ==
    j_sub = JobNode(id='job1',desc='desc1')
    j_sub.set_callback(normal_job)
    j.add_sub_job(j_sub)
    # ==

    '''
    BTW, here's a small tips.
    while designing a large flow, you may want to well-organize your code by
    putting the related things together.
    but sometimes, you can't assign the value you want right after the job is
    initiated because the value should be calculated/generated later.
    we provide the flexibility to delay the manipulation.