utopya_jobtree module

This module defines classes to create a sequence of job scripts.

  • The UtopyaJobStep class could be used to create a single element of a sequence. It can create and submit a stan-alone job file performing some task configured by rcfile settings. In addition, methods are included to create and submit a following job, which are used by the UtopyaJobTree class to generate a sequence of jobs.

  • The UtopyaJobTree class creates a sequence of job, configured as a tree with a main job and sub-jobs, which could have sub-jobs too, etc.

  • The UtopyaJobIteration class creates a job-tree with sub-jobs defined by an iteration number. The iteration over sub-jobs is stopped if certain conditions are reached.

The following classes are defined to perform actual tasks:

  • The dummy class UtopyaJobTask is used in the examples to show where a user defined task should be defined.

  • The UtopyaJobTaskSubmit class could be used to create and submit a (job)tree (as defined in this module); this is useful to let a job submit other jobs and continue with doing other things.

  • The UtopyaJobTaskRun class will call an external program, for example to postprocess created output.

  • If job timing is enabled, the UtopyaJobTreeTiming could be used to summarize the times spent on the various parts of the job tree; see the description of the UtopyaJobStep class for details.

Class hierarchy

The classes are defined according to following hierarchy:

Classes

class utopya_jobtree.UtopyaJobStep(name, rcfile, rcbase='', env={})

Bases: utopya_rc.UtopyaRc

Base class for single job step.

The ‘UtopyaJobStep’ class and its derivatives are used to form a chain of jobs, where each job contains lines that create and submit the next job. The settings in the rcfile define for each of the jobs the class, where to submit the job file to (foreground or batch system), and a task to be performed.

Because this is the base class for the UtopyaJobTree class that defines an actual sequence, a number of methods are included to create and submit a next job. Without using these, the class can be used to create a single stand-alone job file performing some task configured by rcfile settings.

Simple usage

Example of simple usage:

# init UtopyaJobStep with name 'appl', and read settings for this name:
jbs = UtopyaJobStep( 'appl', 'settings.rc', rcbase='', env={} )
# write first job and submit:
jbs.Start()

A shown in the example, the following arguments are used on initialization:

  • name : Job step name, used to read settings and form the job file name.

  • rcfile : Name of settings file.

  • rcbase : Optional prefix for keywords in rcfile.

  • env : Optional dictionairy with variables that will be exported to the environment.

An example of the rcfile settings needed for a job named ‘appl’ that should be submitted to an LSF queue:

! setup logging:
*.logging.level         :  info
*.logging.format        :  %(asctime)s [%(levelname)-8s] %(message)s

! class to create and submit this job within the jobtree:
appl.class                     :  utopya.UtopyaJobStep

! class with the job script creator:
appl.script.class              :  utopya.UtopyaJobScriptLSF

! work directory, here formed from job name "appl":
appl.workdir                   :  ${my.work}/__NAME2PATH__

! (optional) add line to change to work directory,
! for example because job scheduler does not do that:
appl.cwd                       :  True

! (optional) python search path:
appl.pypath                    :  ./py

! (optional) environment modules:
appl.modules                   :  load netcdf/4.4.1 ; load python/2.7

! (optional) extra environment variables that will be used
!   to setup the next jobscript:
appl.env                       :  HOMEDIR="${PWD}", WORKDIR="/work/me/appl"

! (optional) extra lines:
appl.lines                     :  os.environ['OMP_NUM_THREADS'] = os.environ['JOB_NTHREAD']

! task:
appl.task.class                :  utopya.UtopyaJobTask
appl.task.args                 :  msg='Perform application task.'

The logging settings are optional, but can be changed to enable printing of debug messages or to change the format of message lines. See the AddLogging() method for details.

The first specific setting is the class of the job step in <name>.class. This is a necessary setting for automatic creation of jobs by derived classes such as UtopyaJobTree and UtopyaJobIteration, where the creating job needs to know how to form the next job.

The next setting is <name>.script.class, which should specify the name of the class that should be used to create and submit the job script. This class should be derived from the UtopyaJobScript class, see the utopya_jobscript module for default available classes. The above example settings will run the script in foreground. If the job should be submitted to a queue, use a different script class and specify the job options; an example for LSF queue system is

! Specify the module and class with the job script creator:
appl.script.class              :  utopya.UtopyaJobScriptLSF

! batch options for "UtopyaJobScriptLSF" class:
appl.batch.lsf.format          :  batch.lsf.format
appl.batch.lsf.options         :  name output error
appl.batch.lsf.option.name     :  J %(env:name)
appl.batch.lsf.option.output   :  oo %(name).out
appl.batch.lsf.option.error    :  eo %(name).err

! Define format of lsf options, here:
!   #BSUB -flag value
batch.lsf.format.comment           :  #
batch.lsf.format.prefix            :  BSUB
batch.lsf.format.arg               :  -
batch.lsf.format.assign            :  ' '
batch.lsf.format.template          :  %(key)
batch.lsf.format.envtemplate       :  %(env:key)

The python search path (for the utopya modules) could be specified using the <name>.pypath setting, read on initialization. Multiple search directories could be defined using a :-seperated list. Note that this setting is not required and might be ommitted.

Another optional setting is the definition of extra environment variables using the <name>.env setting, read on initialization. The specified content will be converted to a dictionairy type, and the key/value pairs will be exported to the environment.

The actual work should be done by one or more objects, for which the class and initialization arguments are defined by <name>.<task>.* settings. For the above example, the following lines will be inserted in the job script:

tskclass = utopya.ImportClass( "utopya.UtopyaJobTask" )
tsk = tskclass(msg='Perform application task.')

The arguments could include the template ‘%{rcfile}’ to insert the name of the settings file:

# use "settings.rc" from the example:
appl.task.args      :  rcfile='%{rcfile}', rcbase='appl'

Similar the templates ‘%{workdir}’ and ‘%{name}’ could be used to insert the work directory and job name respectively; note that the work directory will always contain the pre-processed settings that were used to create the job in the file ‘%{name}.rc’:

# use "task.rc" from the example:
appl.task.args      :  rcfile='%{workdir}/%{name}.rc', rcbase='appl'

The task part in the rc keys is actually a value out of a list. Use the following settings to define a list of three tasks:

! tasks:
appl.tasks                      :  wakeup work sleep

! task:
appl.wakeup.class               :  utopya.UtopyaJobTask
appl.wakeup.args                :  msg='Wake up!'

! task:
appl.work.class                 :  utopya.UtopyaJobTask
appl.work.args                  :  msg='Work ...'

! task:
appl.sleep.class                :  utopya.UtopyaJobTask
appl.sleep.args                 :  msg='... and go to sleep'

If no task list is specified, default list has just a single element named task.

Job timing

The timing flag could be enabled to have special lines added to the job file to generate a profile of the run times:

! add timing statements to the job script (True|False) ?
app.timing                     :  True

The timing code that is inserted uses classes from the utopya_timing module, and looks like:

# start timing, use only element as name:
timer = utopya.UtopyaTimerTree( "wakeup" )

# task class:
tskclass = utopya.ImportClass( "MyMod.DoSomething" )
# create task object and initialize, which does the actual work:
tsk = tskclass( 'first argument' )

# assume that timing info is stored in the task object
# as an attribute named "timer" ...
if hasattr(tsk,"timer") :
    # store as branch:
    timer.AddBranch( tsk.timer )
#endif

# stop timing:
timer.End()
# postprocess (add "other" branches):
timer.Post()
# save:
timer.Save( "appl.wakeup" )

The first command that is inserted creates an object of the UtopyaTimerTree class. This object could hold a tree of timing objects, which exist of at least a name (str) and the number of seconds spent (float). The name of the timer will be the name of the job, in this example appl.wakeup. The stopwatch function is started immediatelly at initialization, and is stopped by the UtopyaTimerTree.End() method called in the last block; the task for which the run time should be collected is performed with a task object created in between. The task object might use the classes from the utopya_timer module too. If these are stored in an attribute timer, then the lines in the third block will add these “sub” timers as branch to the “main” timer, in order to have more detail on where time is actually spent on. The last lines that are inserted will postprocess the timer tree, and write the content of the tree to a file:

appl.wakeup.prf

The special UtopyaJobTreeTiming class could be used as job task to collect all saved time profiles in the job tree and create an overall timing tree to illustrate the total run time spent on the jobs.

Job chain settings

In case of a job chain, a finished job sets up and submits the next one. The following optional setting is read to specify the work directory where to write the next job file (default is current directory):

! (optional) working directory:
appl.wakeup.workdir        :  /scratch/you/appl-dir

A special template __NAME2PATH__ could be included in the path to insert the job name (appl.wakeup) with the dots replaced by path-seperation characters; for example:

! working directory incl. subdirs for name elements:
appl.wakeup.workdir        :  /scratch/you/__NAME2PATH__

will be expanded to:

/scratch/you/appl/wakeup

The rcfile that should be used to initialize the object of the next job (default is the rcfile used for the finished job):

! (optional) settings to initialize the job step:
appl.wakeup.rcfile         :  rc/my-appl-wakeup.rc

Overview of methods

With the above settings it should be possible to define all application depended features of the jobs. If necessary to extend the class, the following is an overview of the underlying methods.

GetGenericName()

Return own generic name, usefull to obtain extra settings by derived classes.

GetVariables(element)

Return dictionairy with job variables for this class. This is used by the AddVariables() method to add definition lines to the job script.

Append(line)

Add line to script content. A newline is added automatically.

AddHeader()

Add interprator line to script content, here for a python script.

AddOptions()

Add batch job options to script content.

If and how job options are formed is controlled by the script class, defined in the rcfile for each job. Example for a job name appl:

! Specify the module and class with the job script creator:
appl.script.class              :  utopya.JobScriptLSF

The script class is derived from the JobScript class, and has a method JobScript.GetOptionsRc() to form lines that can be added to the script. The method reads setting from the rcfile for the provided generic name:

! batch options:
appl.batch.lsf.format          :  lsf_format
appl.batch.lsf.options         :  name output error workdir
appl.batch.lsf.option.name     :  J %(env:name)
appl.batch.lsf.option.output   :  oo %(name).out
appl.batch.lsf.option.error    :  eo %(name).err
appl.batch.lsf.option.workdir  :  cwd %(env:cwd)

An enviroment is passed with pre-defined values that can be subsituted in the option values, here for the job actual name and the current working dirctory:

env = { 'name' : 'step0001',
        'cwd'  : '/scratch/me/test' }

With the following ‘lsf_format’ option formatting:

! Define format of lsf options, here:
!   #BSUB -flag value
lsf_format.comment           :  #
lsf_format.prefix            :  BSUB
lsf_format.arg               :  -
lsf_format.assign            :  ' '
lsf_format.template          :  %(key)
lsf_format.envtemplate       :  %(env:key)

this will lead to the following option lines:

#BSUB -J step0001
#BSUB -oo step0001.out
#BSUB -eo step0001.err
#BSUB -cwd /scratch/me/test              
AddModules()

Add lines to import standard modules (os, sys) and tool modules (utopya).

AddLogging()

Add script lines to configure how the logging module displays messages. The logging module is used everywhere in the UTOPyA code to print messages:

# modules:
import logging

# info ...
logging.info   ( 'this is an informative message, ..' )
logging.warning( '... this is a warning, ... )
logging.debug  ( '... this shows a message for debugging, ... )
logging.error  ( '... and this is an error message.' )

The rcfile could contain optional settings for the message level above which messages are shown, and the format of the messages:

! (optional) message level: info | debug
<name>.logging.level   :  info

! (optional) message formatting:
<name>.logging.format  :  [%(levelname)-8s] %(message)s

See also:

AddEnvModules()

Add script lines for GNU Environment modules. On many computing platforms, the environment for running applications is managed using ‘module’ commands, e.g.:

module load netcdf
module load python

The module commands to be performed are optionally defined in the recfile for the current job as a semi-colon seperated list:

<name>.modules        :  load netcdf ; load python

Appropriate job lines for these settings will be inserted in the script.

The location of the GNU module scripts should be available in the environment:

MODULESHOME=/opt/modules/3.2.10.4

If the correct location could not be set correctly in the environment, then overwrite it with the following setting:

<name>.moduleshome    :  /opt/modules/3.2.10.4
AddLines()

Add user defined script lines. Could be used to setup special environment.

For example, if the job scheduler defines an environment variable ‘JOB_NTHREAD’ for the number of OpenMP threads, the jobscript could use this to define the ‘OMP_NUM_THREADS’ variable needed by OpenMP code. In a python script, this looks like:

os.environ['OMP_NUM_THREADS'] = os.environ['JOB_NTHREAD']

Specify this code in the settings:

<name>.lines    :  os.environ['OMP_NUM_THREADS'] = os.environ['JOB_NTHREAD']

For a complete block of code, use ‘\n’ marks for line breaks, ‘\t’ for leading indents, and a multi-line rc value for a better readible definition. For example, the following definition:

<name>.lines    : \n\
    print( 'environment:' )\n\
    for key in os.environ.keys() :\n\
    \tprint( '  %=%' % (key,os.environ[key]) )

will be expanded to:

print( 'environment:' )
for key in os.environ.keys() :
    print( '  %=%' % (key,os.environ[key]) )
AddCwd(_indent='')

Add script lines to change to work directory. This is necessary in case the job schedular does not change to the direcotry of the job script, or if the job options do not have a flag to specify it. These lines are only included if the following flag is set:

<name>.cwd        :  True
AddVariables()

Add commands to set job variables that might be used by the tasks.

Always a variable ‘name’ is defined with the job name.

Extra variables might be defined by specific classes, for example an iteration class might define the iteration step. These are collected in a dictionairy named “env” with keys formed from the previous jobs in the tree. An iteration job named ‘appl.run’ stores for example:

env["appl.run.__step__"] = 4
AddTasks()

Add task command to script lines, that consists of a class import and initialization of an object of this class. The class name and the arguments for the initialization are defined in the rcfile settings:

<name>.task.class     :  mymod.MyTask
<name>.task.args      :  msg='Do something'

This will insert the following lines in the job script:

tskclass = utopya.ImportClass( "mymod.MyTask" )
tsk = tskclass(msg='Do something')

The arguments could include the templates to insert current values:

  • %{name} : full job name

  • %{root} : root of name, thus without last element

  • %{root_generic} : root of generic name

  • %{rcfile} : file with jobtree settings.

  • %{workdir} : work directory

Example of usage:

appl.task.args      :  rcfile='%{rcfile}', rcbase='applx'

If the class name is left empty, nothing is inserted and no arguments need to be specified.

The task part in the rc keys is actually a value out of a list. Use the following settings to define a list of three tasks:

! tasks:
appl.tasks                      :  wakeup work sleep

! task:
appl.wakeup.class               :  utopya.UtopyaJobTask
appl.wakeup.args                :  msg='Wake up!'

! task:
appl.work.class                 :  utopya.UtopyaJobTask
appl.work.args                  :  msg='Work ...'

! task:
appl.sleep.class                :  utopya.UtopyaJobTask
appl.sleep.args                 :  msg='... and go to sleep'

If no task list is specified, default list has just a single element named task.

AddNextJob(without_next=False)

Add lines that create and submit the next job step if necessary.

AddFooter()

Add closing lines to script content.

GetFileName(name)

Return name of job file to be written. Here use the job name and add extension ‘.jb’.

WriteAndSubmit(_indent='')

Write script content to file with provided name, and submit the created file. Creation and submission is performed by an object derived from the JobScript class. The name of this class is defined in the settings, as well as the working directory (empty for current):

<name>.script.class         :  utopya.JobScriptForeground
<name>.workdir              :  /work/appl/run
Run(without_next=False, single=False, _indent='')

Create and submit the job for the named item in the job step.

The content of the job file is filled using calls to methods:

The file is written and submitted by a call to the method:

* :py:meth:`WriteAndSubmit`
Start(single=False)

Create object for the first job along tree that is not marked as virtual, and call its Run() method.

The single flag is passed to the Run() method; if enabled, only a single job step is performed.

CheckStatus(_indent='')

Check status of this job. This method will first look in the work directory for a file holding the process or job id:

<name>.pid

If not found, an error is raised. The file with the process id is then passed to the CheckStatus of the script class that is used by this job.

GetNextElement(element, parfirst=None, indent='')

Returns the next element in a list (iteration?) of job steps. For derived classes such as ‘UtopyaJobTree’ it is sufficient to re-define only this method.

If no next sub-element is available, the value ‘None’ is returned. If the requested element is ‘None’, the name of the first sub-element is returned (if present).

GetNextName(finish=False, indent='', check_jump=True, parfirst=False)

Return information on the next job name in a chain, including information on performing a test on continuation of the chain (if necesary).

Two values are returned:

  • the next job name;

  • the name of the job step that should decide on continuation (or ‘None’ if not needed).

In case the job chain is a list with flexible end (for example an iteration sequence), the next job after the end is returned in case the finish flag is enabled.

In case the job is an element of a parallel list and the parfirst flag is enabled, then the next job after the end of the list is returned.

The name of the next job is read from the job tree definition in the rcfile. The next job could also be specified directly using an adhoc setting, which is useful to skip a part of the tree:

<name>.jump-to     :  nextname

This feature could be disabled with check_jump=False.

GetFirstName(name, indent='')

Return first name along tree that is not marked as “virtual”.

CheckContinuation(element)

Check status given job name and its generic represenation. The job name could be used to derive the iteration step number and to read output files; the generic name is usefull to read settings.

Returns two str values:
  • order : one of ‘continue’, ‘finish’, ‘error’

  • msg : informative message to explain the order

Here always return ‘finish’ since this is a single job.

class utopya_jobtree.UtopyaJobTree(name, rcfile, rcbase='', env={})

Bases: utopya_jobtree.UtopyaJobStep

Class to create and submit a chain of job scripts defined in the rcfile as a tree with branches.

Example of a tree defined for name ‘appl’:

appl
    .build
    .init
         .emis
         .obs
             .point
             .sat
             .valid
    .run
        .fwd
        .dep
        .grd
        .opt
    .done

This will create the job chain (see below for the meaning of “v”):

appl.jb                   (v)
appl.build.jb
appl.init.jb              (v)
appl.init.emis.jb
appl.init.obs.jb          (v)
appl.init.obs.point.jb
appl.init.obs.sat.jb
appl.init.obs.valid.jb
appl.run.jb               (v)
appl.run.fwd.jb
appl.run.dep.jb
appl.run.grd.jb
appl.run.opt.jb
appl.done.jb

The tree is defined by lists of element names. The definitions in the settings should first define the elements of the main trunk. Example for settings for a job named ‘appl’:

! class to create a job tree:
appl.class                     :  utopya.UtopyaJobTree

! list of sub-elements:
appl.elements                  :  build init run done

A job is also created for the trunk, in this case “appl”. This is useful in case resources (memory, cpu’s) should be allocated once for all jobs in a sub-tree; to achieve this, define resources for the trunk, and submit the elements to the foreground. The “trunk” job is skipped if it is declared to be “virtual”; in the above example, the jobs that can be skipped in this way are marked with a “v”. A trunk is declared virtual by an optional rc setting, which is False by default if not defined:

! virtual main job?
appl.virtual                   :  True

For each element in the list it is necessary to define the class that should be used to create it. For the “appl.build” job, this could be simply the “UtopyaJobStep” class since no sub-jobs are necessary:

! job step class for this branch:
appl.build.class                     :  utopya.UtopyaJobStep

For the “appl.init” job however, sub-jobs are defined. Use for this the “UtopyaJobTree” class again, and define a list with the sub elements:

! UtopyaJobTree class for this branch:
appl.init.class                      :  utopya.UtopyaJobTree
appl.init.elements                   :  emis obs

The elements are combined with the ‘parent’ elements and form together the full job name, for example “appl.init.emis”.

For all names (non-virtual) in the tree, define the class that should be used to create and submit individual jobs. If the jobs are to be submitted to a queue, specify job options too. Example for the ‘appl.build’ step:

! submit to LSF queue:
appl.build.script.class                 :  utopya.JobScriptLSF

! batch options:
appl.build.batch.lsf.format             :  batch.lsf
appl.build.batch.lsf.options            :  name output error
appl.build.batch.lsf.option.name        :  J %(env:name)
appl.build.batch.lsf.option.output      :  oo %(name).out
appl.build.batch.lsf.option.error       :  eo %(name).err

The actual work is again performed by an object derived from the UtopyaJobTask class, for which proper initialization arguments should be specified:

appl.build.task.class     :  mymod.MyTask
appl.build.task.args      :  msg='Do something'

While testing the job tree it is sometimes useful to skip a number of sub-jobs. This could be specified by a ‘jump-to’ specification. If this is present for a certain job name, the value should be the name of the next job that should be created. For example, to have build new scripts and executables but skip the initialization steps, include in the settings:

appl.build.jump-to      :  appl.run
GetNextElement(element, parfirst=None, indent='')

Returns next element from the list to which the requested element belongs. For example, for the name “init.obs” the list of elements is defined by:

init.obs.elements   :  point sat valid

In this example, the next element after “point” is “sat”.

If the requested element is ‘None’, return the first element (“point”). Otherwise, the requested element should be in the defined list, and either the next element is returned, or ‘None’ if the last element was requested.

class utopya_jobtree.UtopyaJobParallel(name, rcfile, rcbase='', env={})

Bases: utopya_jobtree.UtopyaJobStep

Class to create and submit a series of job scripts that will run in parallel.

The elements of the series are defined by the following properties:

  • a generic name that is used to define the rcfile settings;

  • a format to create an element name (iteration) from an integer step number;

  • the first and the last ‘step’ number, each step number is one parallel job.

As example, the following settings define a series of 4 elements:

! job:
appl.run.class                                  :  utopya.UtopyaJobParallel

! generic name for elements:
appl.run.generic                                :  part-NN
! formatting rule for actual step names given
! an integer step number; 
! syntax should follow str.format() rules ; 
! here 2 digits with zero padding:
appl.run.step_format                            :  part-{step:0>2}
! initial step numbers:
appl.run.step_start                             :  1
! maximum possible number for defined format:
appl.run.step_max                               :  4

It is sometimes useful to have a final element to collect the results of the parallel jobs; specify the element name with:

! add final element to collect results:
appl.run.final            : gather

The job class and other job setting such as the tasks to be performed should be defined using the generic name. The task class might need to now its own step number; it is available as ‘<name>.__step__’ in the job environment dictionairy and can be used in the task class arguments. The following example defines jobs that wait for a number of seconds proportional to the step number using the UtopyaJobTaskWait class:

! job:
appl.run.part-NN.class                :  utopya.UtopyaJobStep
! task:
appl.run.part-NN.task.class           :  utopya.UtopyaJobTaskWait
appl.run.part-NN.task.args            :  msg='Perform part of task.', \
                                          nsec=5*env['appl.run.__step__'], \
                                          nsecinfo=1

Only the last parallel job in the series will create the next job, which is the ‘gather’ job if defined or the next job in the tree. Typically the next job peforms two tasks:

  • wait for all the parallel jobs to be finished using the :py:class`.UtopyaJobParallelWait` class;

  • gather the output from the parallel jobs if necessary.

The follow exaple illustrates how these tasks could be configured:

! job:
appl.run.gather.class             :  utopya.UtopyaJobStep
! tasks:
appl.run.gather.tasks             :  wait post
! task:
appl.run.gather.task.wait.class   :  utopya.UtopyaJobParallelWait
appl.run.gather.task.wait.args    :  'appl.run', 'appl.run', '%{rcfile}'
! task:
appl.run.gather.task.post.class   :  myTool.GatherPar
appl.run.gather.task.post.args    :  'appl.run', ...
GetNextElement(element, parfirst=False, indent='')

Return name of next element.

As example, if the element pased as argument is ‘part-02’, then the returned value is ‘part-03’.

If the element passed as argument is ‘None’, then the name that corresponds to the first iteration step is returned.

If the passed element is the last one, or if the passed element is not None but the parfirst flag is enabled, then ‘None’ is returned.

GetStepNumber(element)

Extract step number from element name and return as integer.

Current implementation uses the step range setting in the rcfile (from ‘step_start’ to ‘step_max’) to perform a loop over possible step numbers. For each step, the ‘step_format’ is evaluated and compared to the provided element; if a match is found, the step is known. This brute-force test should in future be replaced by a more elegant method reading the number given the format.

GetVariables(element)

Return dictionairy with job variables, for this class the iteration step:

{ '__step__' : 4 }
AddTasks()

Add commands to the job script that submit sub jobs in parallel. The lines contain a ‘for’ loop over the step numbers, where in each step the UtopyaJobTaskSubmit class is used to create and submit a jobfile for that particular step.

Only the last parallel job in the list will create the next job in the tree. Typically the next job first waits for all the parallel jobs to be finished using the :py:class`.UtopyaJobParallelWait` class, and then gathers the output from the parallel jobs if necessary.

AddNextJob(without_next=False)

Usually this method adds the lines that create and submit the next job step (if necessary), but for this class the last element of the parallel jobs wil do that. Therefore this method will just insert some comment lines. The arguments are ignored.

GetParallelJobs()

Return info on job names performed in parallel:

  • jobnames : list with actual job names;

  • jobname_generic : generic performed in parallel.

class utopya_jobtree.UtopyaJobParallelWait(root, root_generic, rcfile, rcbase='', env={}, nsec=1, _indent='')

Bases: utopya_rc.UtopyaRc

Class to wait for all elements of a parallel job to be finished.

This class will use the class definition to create a temporary instance of a job, mainly to find the actual work directory of the job. In this work directory a file <name>.pid is expected to be present, which contains information needed to figure out the current job status; probably this is just an integer number which is the job id.

In addition an instance of the script.class will be created from which the CheckStatus method is called with the .pid file as argument. Depending on the script class, this method will check either the running processes or a batch queue for the job status.

Arguments:

  • root : name of parallel job

  • root_generic : generic name of parallel job

  • rcfile : settings for job tree

Optional arguments:

  • rcbase : prefix for rcfile keys

  • env : environment dictionairy to expand variables used in rcfile

  • nsec : number of seconds to wait between status checks

class utopya_jobtree.UtopyaJobIteration(name, rcfile, rcbase='', env={})

Bases: utopya_jobtree.UtopyaJobTree

Class to create and submit a chain of job scripts that are defined as iteration steps.

Example of a tree defined for name ‘appl’:

appl
    .build
    .init
         .emis
         .obs
             .point
             .sat
             .valid
    .run
        .iter-0001
                  .fwd
                  .dep
                  .grd
                  .opt
    .run
        .iter-0002
                  .fwd
                  .dep
                  .grd
                  .opt
               :
    .done

This will create the job chain:

appl.jb
appl_build.jb
appl_init.jb                 (v)
appl_init_emis.jb
appl_init_obs.jb             (v)
appl_init_obs_point.jb
appl_init_obs_sat.jb
appl_init_obs_valid.jb
appl_run.jb                  (v)
appl_run_iter-0001.jb        (v)
appl_run_iter-0001_fwd.jb
appl_run_iter-0001_dep.jb
appl_run_iter-0001_grd.jb
appl_run_iter-0001_opt.jb
appl_run_iter-0002.jb        (v)
appl_run_iter-0002_fwd.jb
appl_run_iter-0002_dep.jb
appl_run_iter-0002_grd.jb
appl_run_iter-0002_opt.jb
                 :
done.jb

The jobs marked with “(v)” are virtual and actually not created; see the description of virtual jobs.

The iteration list is defined by the following properties:

  • a generic name used to read settings;

  • a format to create an element name (iteration) from an integer step number;

  • the initial step number;

  • a maximum posible step number, used to perform a loop over possible names.

Example rcfile settings for name ‘appl.run’ that define the iteration steps:

! job iteration class:
appl.run.class                                   :  utopya.UtopyaJobIteration

! generic for step name used in settings:
appl.run.generic                                 :  iter-NNNN
! formatting rule for actual step names given
! an integer step number; 
! syntax should follow str.format() rules ; 
! here 4 digits with zero padding:
appl.run.step_format                             :  iter-{step:0>4}
! initial step numbers:
appl.run.step_start                              :  1
! maximum possible number for defined format:
appl.run.step_max                                :  9999
! optional step size, default 1:
appl.run.step_size                               :  1

For the above example, each iteration consists of 4 sub-jobs. Define these using the generic name; eventual mark the iteration job as virtual:

! sub list:
appl.run.iter-NNNN.class                        :  utopya.UtopyaJobTree
appl.run.iter-NNNN.virtual                      :  True
appl.run.iter-NNNN.elements                     :  fwd dep grd opt

The method CheckContinuation() from the base class is re-implemented to decide if a next iteration step should be performed or that the loop should be terminated, given a step number. For derived classes that implement a new iteration loop, it might be sufficent to only re-define this method too in order to terminate the loop at the right step.

GetStepNumbers(element)

Extract step number from element name, and also return maximum number. This method is used by ‘GetNextElement’ and ‘CheckContinuation’ to translate a job name to a number and decide on the next step.

Return value is a three element tupple with integers:

step,step_max,step_size

Current implementation uses the step range setting in the rcfile (from ‘step_start’ to ‘step_max’) to perform a loop over possible step numbers. For each step, the ‘step_format’ is evaluated and compared to the provided element; if a match is found, the step is known. This brute-force test should in future be replaced by a more elegant method reading the number given the format.

GetVariables(element)

Return dictionairy with job variables, for this class the iteration step:

{ '__step__' : 4 }
GetNextElement(element, indent='', parfirst=None)

Return name of next element (if present).

As example, if the element pased as argument is ‘iter-0004’, then the returned value is ‘iter-0005’.

If the element passed as argument is ‘None’, then the name that corresponds to the first iteration step is returned.

CheckContinuation(element)

Check iteration status given job name. The job name could be used to derive the iteration step number to decide whether a maximum is reached, or it could be used to read output files to decide whether convergence is reached.

Returns two str values:
  • order : one of ‘continue’, ‘finish’, ‘error’

  • msg : informative message to explain the order

In this implementation, the loop is finished if the maximum step number is reached, defined by the ‘step_max’ value in the settings. The current step number and the maximum are obtained from a call to the GetStepNumbers() method.

class utopya_jobtree.UtopyaJobIteration_CheckFile(name, rcfile, rcbase='', env={})

Bases: utopya_jobtree.UtopyaJobIteration

UTOPyA JobIteration class with the CheckContinuation() method defined to read instructions from an input file.

CheckContinuation(element)

Check iteration status by reading a text file named:

<name>.<element>.msg

The text file should consist of 2 lines that are read and provided as return values:

  • order : one of ‘continue’, ‘finish’, ‘error’

  • msg : informative message to explain the order

class utopya_jobtree.UtopyaJobTask(msg=None)

Bases: object

Dummy class for illustration of the UtopyaJobStep class and its derivatives.

In this implementation, an optional str message could be passed on initialization, which is printed if pressent.

class utopya_jobtree.UtopyaJobTaskSubmit(name, rcfile, rcbase='', env={}, name_generic=None, without_next=False, msg=None, _indent='')

Bases: utopya_jobtree.UtopyaJobTask

Job task to create and submit a job(tree).

Arguments:

  • name : name of the job(tree) settings

  • rcfile : settings file

Optional arguments:

  • rcbase, env : initialization arguments for rcfile

  • name_generic : generic name for settings, for example without iteration number expanded

  • without_next : if True, do not submit next step in jobtree (used for testing)

  • msg : informative message passed to base class (used for testing)

For example, with name equal to ‘appl’ the first lines of the setting could be:

! single job:
appl.class        :  utopya.UtopyaJobStep

! task:
appl.task.class   :  utopya.UtopyaJobTask
appl.task.args    :  msg='Perform application task.'

An instance of the from UtopyaJobStep derived class will be created, and after initialization, the Run method of the instance is called.

class utopya_jobtree.UtopyaJobTaskRun(command='')

Bases: utopya_jobtree.UtopyaJobTask

Job task to run executable.

The argument specifies the command:

'appl.x --flag=1 input.txt'
class utopya_jobtree.UtopyaJobTaskWait(nsec=5, nsecinfo=None, msg=None)

Bases: utopya_jobtree.UtopyaJobTask

Job task to wait for a while, sometimes useful for testing.

Arguments:

  • n : number of seconds to wait

  • nsecinfo : show message after specified number of seconds

  • msg : informative message

class utopya_jobtree.UtopyaJobTreeTiming(name, rcfile, rcbase='', env={}, html=False)

Bases: utopya_rc.UtopyaRc

Collect timing profiles written by jobs in a job tree, and create an overall timing profile.

Arguments:

  • name : base name of job tree

  • rcfile : settings for jobtree with specified base name

Optional arguments:

  • rcbase, env : used for initialization of rcfile configuration

  • html : if enabled, create html index page with bar plots of timing