utopya_jobtree
module¶
This module defines classes to create a sequence of job scripts.
The
UtopyaJobStep
class could be used to create a single element of a sequence. It can create and submit a stan-alone job file performing some task configured by rcfile settings. In addition, methods are included to create and submit a following job, which are used by theUtopyaJobTree
class to generate a sequence of jobs.The
UtopyaJobTree
class creates a sequence of job, configured as a tree with a main job and sub-jobs, which could have sub-jobs too, etc.The
UtopyaJobIteration
class creates a job-tree with sub-jobs defined by an iteration number. The iteration over sub-jobs is stopped if certain conditions are reached.
The following classes are defined to perform actual tasks:
The dummy class
UtopyaJobTask
is used in the examples to show where a user defined task should be defined.The
UtopyaJobTaskSubmit
class could be used to create and submit a (job)tree (as defined in this module); this is useful to let a job submit other jobs and continue with doing other things.The
UtopyaJobTaskRun
class will call an external program, for example to postprocess created output.If job timing is enabled, the
UtopyaJobTreeTiming
could be used to summarize the times spent on the various parts of the job tree; see the description of theUtopyaJobStep
class for details.
Class hierarchy¶
The classes are defined according to following hierarchy:
Classes¶
- class utopya_jobtree.UtopyaJobStep(name, rcfile, rcbase='', env={})¶
Bases:
utopya_rc.UtopyaRc
Base class for single job step.
The ‘UtopyaJobStep’ class and its derivatives are used to form a chain of jobs, where each job contains lines that create and submit the next job. The settings in the rcfile define for each of the jobs the class, where to submit the job file to (foreground or batch system), and a task to be performed.
Because this is the base class for the
UtopyaJobTree
class that defines an actual sequence, a number of methods are included to create and submit a next job. Without using these, the class can be used to create a single stand-alone job file performing some task configured by rcfile settings.Simple usage
Example of simple usage:
# init UtopyaJobStep with name 'appl', and read settings for this name: jbs = UtopyaJobStep( 'appl', 'settings.rc', rcbase='', env={} ) # write first job and submit: jbs.Start()
A shown in the example, the following arguments are used on initialization:
name : Job step name, used to read settings and form the job file name.
rcfile : Name of settings file.
rcbase : Optional prefix for keywords in rcfile.
env : Optional dictionairy with variables that will be exported to the environment.
An example of the rcfile settings needed for a job named ‘appl’ that should be submitted to an LSF queue:
! setup logging: *.logging.level : info *.logging.format : %(asctime)s [%(levelname)-8s] %(message)s ! class to create and submit this job within the jobtree: appl.class : utopya.UtopyaJobStep ! class with the job script creator: appl.script.class : utopya.UtopyaJobScriptLSF ! work directory, here formed from job name "appl": appl.workdir : ${my.work}/__NAME2PATH__ ! (optional) add line to change to work directory, ! for example because job scheduler does not do that: appl.cwd : True ! (optional) python search path: appl.pypath : ./py ! (optional) environment modules: appl.modules : load netcdf/4.4.1 ; load python/2.7 ! (optional) extra environment variables that will be used ! to setup the next jobscript: appl.env : HOMEDIR="${PWD}", WORKDIR="/work/me/appl" ! (optional) extra lines: appl.lines : os.environ['OMP_NUM_THREADS'] = os.environ['JOB_NTHREAD'] ! task: appl.task.class : utopya.UtopyaJobTask appl.task.args : msg='Perform application task.'
The logging settings are optional, but can be changed to enable printing of debug messages or to change the format of message lines. See the
AddLogging()
method for details.The first specific setting is the class of the job step in
<name>.class
. This is a necessary setting for automatic creation of jobs by derived classes such asUtopyaJobTree
andUtopyaJobIteration
, where the creating job needs to know how to form the next job.The next setting is
<name>.script.class
, which should specify the name of the class that should be used to create and submit the job script. This class should be derived from theUtopyaJobScript
class, see theutopya_jobscript
module for default available classes. The above example settings will run the script in foreground. If the job should be submitted to a queue, use a different script class and specify the job options; an example for LSF queue system is! Specify the module and class with the job script creator: appl.script.class : utopya.UtopyaJobScriptLSF ! batch options for "UtopyaJobScriptLSF" class: appl.batch.lsf.format : batch.lsf.format appl.batch.lsf.options : name output error appl.batch.lsf.option.name : J %(env:name) appl.batch.lsf.option.output : oo %(name).out appl.batch.lsf.option.error : eo %(name).err ! Define format of lsf options, here: ! #BSUB -flag value batch.lsf.format.comment : # batch.lsf.format.prefix : BSUB batch.lsf.format.arg : - batch.lsf.format.assign : ' ' batch.lsf.format.template : %(key) batch.lsf.format.envtemplate : %(env:key)
The python search path (for the utopya modules) could be specified using the
<name>.pypath
setting, read on initialization. Multiple search directories could be defined using a:
-seperated list. Note that this setting is not required and might be ommitted.Another optional setting is the definition of extra environment variables using the
<name>.env
setting, read on initialization. The specified content will be converted to a dictionairy type, and the key/value pairs will be exported to the environment.The actual work should be done by one or more objects, for which the class and initialization arguments are defined by
<name>.<task>.*
settings. For the above example, the following lines will be inserted in the job script:tskclass = utopya.ImportClass( "utopya.UtopyaJobTask" ) tsk = tskclass(msg='Perform application task.')
The arguments could include the template ‘%{rcfile}’ to insert the name of the settings file:
# use "settings.rc" from the example: appl.task.args : rcfile='%{rcfile}', rcbase='appl'
Similar the templates ‘%{workdir}’ and ‘%{name}’ could be used to insert the work directory and job name respectively; note that the work directory will always contain the pre-processed settings that were used to create the job in the file ‘%{name}.rc’:
# use "task.rc" from the example: appl.task.args : rcfile='%{workdir}/%{name}.rc', rcbase='appl'
The
task
part in the rc keys is actually a value out of a list. Use the following settings to define a list of three tasks:! tasks: appl.tasks : wakeup work sleep ! task: appl.wakeup.class : utopya.UtopyaJobTask appl.wakeup.args : msg='Wake up!' ! task: appl.work.class : utopya.UtopyaJobTask appl.work.args : msg='Work ...' ! task: appl.sleep.class : utopya.UtopyaJobTask appl.sleep.args : msg='... and go to sleep'
If no task list is specified, default list has just a single element named
task
.Job timing
The
timing
flag could be enabled to have special lines added to the job file to generate a profile of the run times:! add timing statements to the job script (True|False) ? app.timing : True
The timing code that is inserted uses classes from the
utopya_timing
module, and looks like:# start timing, use only element as name: timer = utopya.UtopyaTimerTree( "wakeup" ) # task class: tskclass = utopya.ImportClass( "MyMod.DoSomething" ) # create task object and initialize, which does the actual work: tsk = tskclass( 'first argument' ) # assume that timing info is stored in the task object # as an attribute named "timer" ... if hasattr(tsk,"timer") : # store as branch: timer.AddBranch( tsk.timer ) #endif # stop timing: timer.End() # postprocess (add "other" branches): timer.Post() # save: timer.Save( "appl.wakeup" )
The first command that is inserted creates an object of the
UtopyaTimerTree
class. This object could hold a tree of timing objects, which exist of at least a name (str) and the number of seconds spent (float). The name of the timer will be the name of the job, in this exampleappl.wakeup
. The stopwatch function is started immediatelly at initialization, and is stopped by theUtopyaTimerTree.End()
method called in the last block; the task for which the run time should be collected is performed with a task object created in between. The task object might use the classes from theutopya_timer
module too. If these are stored in an attributetimer
, then the lines in the third block will add these “sub” timers as branch to the “main” timer, in order to have more detail on where time is actually spent on. The last lines that are inserted will postprocess the timer tree, and write the content of the tree to a file:appl.wakeup.prf
The special
UtopyaJobTreeTiming
class could be used as job task to collect all saved time profiles in the job tree and create an overall timing tree to illustrate the total run time spent on the jobs.Job chain settings
In case of a job chain, a finished job sets up and submits the next one. The following optional setting is read to specify the work directory where to write the next job file (default is current directory):
! (optional) working directory: appl.wakeup.workdir : /scratch/you/appl-dir
A special template
__NAME2PATH__
could be included in the path to insert the job name (appl.wakeup
) with the dots replaced by path-seperation characters; for example:! working directory incl. subdirs for name elements: appl.wakeup.workdir : /scratch/you/__NAME2PATH__
will be expanded to:
/scratch/you/appl/wakeup
The rcfile that should be used to initialize the object of the next job (default is the rcfile used for the finished job):
! (optional) settings to initialize the job step: appl.wakeup.rcfile : rc/my-appl-wakeup.rc
Overview of methods
With the above settings it should be possible to define all application depended features of the jobs. If necessary to extend the class, the following is an overview of the underlying methods.
Start the job step (stand-alone), or the first job in a sequence (tree), by calling the following method:
The job scripts are created and submitted with method:
The ‘Run’ method uses the following methodes to fill the content of the job file:
A job step might define special variables that can be passed in the task arguments. The following method is used to define these:
The job file is written and submitted using:
GetJobFileName()
The following methods will be used to obtain the next jobname in the chain:
In case the job chain consists of an iteration sequence, a method should be provided to decide on continuation or termination of the sequence:
- GetGenericName()¶
Return own generic name, usefull to obtain extra settings by derived classes.
- GetVariables(element)¶
Return dictionairy with job variables for this class. This is used by the
AddVariables()
method to add definition lines to the job script.
- Append(line)¶
Add line to script content. A newline is added automatically.
- AddHeader()¶
Add interprator line to script content, here for a python script.
- AddOptions()¶
Add batch job options to script content.
If and how job options are formed is controlled by the script class, defined in the rcfile for each job. Example for a job name
appl
:! Specify the module and class with the job script creator: appl.script.class : utopya.JobScriptLSF
The script class is derived from the
JobScript
class, and has a methodJobScript.GetOptionsRc()
to form lines that can be added to the script. The method reads setting from the rcfile for the provided generic name:! batch options: appl.batch.lsf.format : lsf_format appl.batch.lsf.options : name output error workdir appl.batch.lsf.option.name : J %(env:name) appl.batch.lsf.option.output : oo %(name).out appl.batch.lsf.option.error : eo %(name).err appl.batch.lsf.option.workdir : cwd %(env:cwd)
An enviroment is passed with pre-defined values that can be subsituted in the option values, here for the job actual name and the current working dirctory:
env = { 'name' : 'step0001', 'cwd' : '/scratch/me/test' }
With the following ‘lsf_format’ option formatting:
! Define format of lsf options, here: ! #BSUB -flag value lsf_format.comment : # lsf_format.prefix : BSUB lsf_format.arg : - lsf_format.assign : ' ' lsf_format.template : %(key) lsf_format.envtemplate : %(env:key)
this will lead to the following option lines:
#BSUB -J step0001 #BSUB -oo step0001.out #BSUB -eo step0001.err #BSUB -cwd /scratch/me/test
- AddModules()¶
Add lines to import standard modules (os, sys) and tool modules (utopya).
- AddLogging()¶
Add script lines to configure how the
logging
module displays messages. Thelogging
module is used everywhere in the UTOPyA code to print messages:# modules: import logging # info ... logging.info ( 'this is an informative message, ..' ) logging.warning( '... this is a warning, ... ) logging.debug ( '... this shows a message for debugging, ... ) logging.error ( '... and this is an error message.' )
The rcfile could contain optional settings for the message level above which messages are shown, and the format of the messages:
! (optional) message level: info | debug <name>.logging.level : info ! (optional) message formatting: <name>.logging.format : [%(levelname)-8s] %(message)s
See also:
- AddEnvModules()¶
Add script lines for GNU Environment modules. On many computing platforms, the environment for running applications is managed using ‘module’ commands, e.g.:
module load netcdf module load python
The module commands to be performed are optionally defined in the recfile for the current job as a semi-colon seperated list:
<name>.modules : load netcdf ; load python
Appropriate job lines for these settings will be inserted in the script.
The location of the GNU module scripts should be available in the environment:
MODULESHOME=/opt/modules/3.2.10.4
If the correct location could not be set correctly in the environment, then overwrite it with the following setting:
<name>.moduleshome : /opt/modules/3.2.10.4
- AddLines()¶
Add user defined script lines. Could be used to setup special environment.
For example, if the job scheduler defines an environment variable ‘JOB_NTHREAD’ for the number of OpenMP threads, the jobscript could use this to define the ‘OMP_NUM_THREADS’ variable needed by OpenMP code. In a python script, this looks like:
os.environ['OMP_NUM_THREADS'] = os.environ['JOB_NTHREAD']
Specify this code in the settings:
<name>.lines : os.environ['OMP_NUM_THREADS'] = os.environ['JOB_NTHREAD']
For a complete block of code, use ‘\n’ marks for line breaks, ‘\t’ for leading indents, and a multi-line rc value for a better readible definition. For example, the following definition:
<name>.lines : \n\ print( 'environment:' )\n\ for key in os.environ.keys() :\n\ \tprint( ' %=%' % (key,os.environ[key]) )
will be expanded to:
print( 'environment:' ) for key in os.environ.keys() : print( ' %=%' % (key,os.environ[key]) )
- AddCwd(_indent='')¶
Add script lines to change to work directory. This is necessary in case the job schedular does not change to the direcotry of the job script, or if the job options do not have a flag to specify it. These lines are only included if the following flag is set:
<name>.cwd : True
- AddVariables()¶
Add commands to set job variables that might be used by the tasks.
Always a variable ‘name’ is defined with the job name.
Extra variables might be defined by specific classes, for example an iteration class might define the iteration step. These are collected in a dictionairy named “env” with keys formed from the previous jobs in the tree. An iteration job named ‘appl.run’ stores for example:
env["appl.run.__step__"] = 4
- AddTasks()¶
Add task command to script lines, that consists of a class import and initialization of an object of this class. The class name and the arguments for the initialization are defined in the rcfile settings:
<name>.task.class : mymod.MyTask <name>.task.args : msg='Do something'
This will insert the following lines in the job script:
tskclass = utopya.ImportClass( "mymod.MyTask" ) tsk = tskclass(msg='Do something')
The arguments could include the templates to insert current values:
%{name}
: full job name%{root}
: root of name, thus without last element%{root_generic}
: root of generic name%{rcfile}
: file with jobtree settings.%{workdir}
: work directory
Example of usage:
appl.task.args : rcfile='%{rcfile}', rcbase='applx'
If the class name is left empty, nothing is inserted and no arguments need to be specified.
The
task
part in the rc keys is actually a value out of a list. Use the following settings to define a list of three tasks:! tasks: appl.tasks : wakeup work sleep ! task: appl.wakeup.class : utopya.UtopyaJobTask appl.wakeup.args : msg='Wake up!' ! task: appl.work.class : utopya.UtopyaJobTask appl.work.args : msg='Work ...' ! task: appl.sleep.class : utopya.UtopyaJobTask appl.sleep.args : msg='... and go to sleep'
If no task list is specified, default list has just a single element named
task
.
- AddNextJob(without_next=False)¶
Add lines that create and submit the next job step if necessary.
Add closing lines to script content.
- GetFileName(name)¶
Return name of job file to be written. Here use the job name and add extension ‘.jb’.
- WriteAndSubmit(_indent='')¶
Write script content to file with provided name, and submit the created file. Creation and submission is performed by an object derived from the
JobScript
class. The name of this class is defined in the settings, as well as the working directory (empty for current):<name>.script.class : utopya.JobScriptForeground <name>.workdir : /work/appl/run
- Run(without_next=False, single=False, _indent='')¶
Create and submit the job for the named item in the job step.
The content of the job file is filled using calls to methods:
AddNextJob()
(not called if ``single=True``)
The file is written and submitted by a call to the method:
* :py:meth:`WriteAndSubmit`
- Start(single=False)¶
Create object for the first job along tree that is not marked as virtual, and call its
Run()
method.The
single
flag is passed to theRun()
method; if enabled, only a single job step is performed.
- CheckStatus(_indent='')¶
Check status of this job. This method will first look in the work directory for a file holding the process or job id:
<name>.pid
If not found, an error is raised. The file with the process id is then passed to the
CheckStatus
of the script class that is used by this job.
- GetNextElement(element, parfirst=None, indent='')¶
Returns the next element in a list (iteration?) of job steps. For derived classes such as ‘UtopyaJobTree’ it is sufficient to re-define only this method.
If no next sub-element is available, the value ‘None’ is returned. If the requested element is ‘None’, the name of the first sub-element is returned (if present).
- GetNextName(finish=False, indent='', check_jump=True, parfirst=False)¶
Return information on the next job name in a chain, including information on performing a test on continuation of the chain (if necesary).
Two values are returned:
the next job name;
the name of the job step that should decide on continuation (or ‘None’ if not needed).
In case the job chain is a list with flexible end (for example an iteration sequence), the next job after the end is returned in case the
finish
flag is enabled.In case the job is an element of a parallel list and the
parfirst
flag is enabled, then the next job after the end of the list is returned.The name of the next job is read from the job tree definition in the rcfile. The next job could also be specified directly using an adhoc setting, which is useful to skip a part of the tree:
<name>.jump-to : nextname
This feature could be disabled with
check_jump=False
.
- GetFirstName(name, indent='')¶
Return first name along tree that is not marked as “virtual”.
- CheckContinuation(element)¶
Check status given job name and its generic represenation. The job name could be used to derive the iteration step number and to read output files; the generic name is usefull to read settings.
- Returns two str values:
order : one of ‘continue’, ‘finish’, ‘error’
msg : informative message to explain the order
Here always return ‘finish’ since this is a single job.
- class utopya_jobtree.UtopyaJobTree(name, rcfile, rcbase='', env={})¶
Bases:
utopya_jobtree.UtopyaJobStep
Class to create and submit a chain of job scripts defined in the rcfile as a tree with branches.
Example of a tree defined for name ‘appl’:
appl .build .init .emis .obs .point .sat .valid .run .fwd .dep .grd .opt .done
This will create the job chain (see below for the meaning of “v”):
appl.jb (v) appl.build.jb appl.init.jb (v) appl.init.emis.jb appl.init.obs.jb (v) appl.init.obs.point.jb appl.init.obs.sat.jb appl.init.obs.valid.jb appl.run.jb (v) appl.run.fwd.jb appl.run.dep.jb appl.run.grd.jb appl.run.opt.jb appl.done.jb
The tree is defined by lists of element names. The definitions in the settings should first define the elements of the main trunk. Example for settings for a job named ‘appl’:
! class to create a job tree: appl.class : utopya.UtopyaJobTree ! list of sub-elements: appl.elements : build init run done
A job is also created for the trunk, in this case “appl”. This is useful in case resources (memory, cpu’s) should be allocated once for all jobs in a sub-tree; to achieve this, define resources for the trunk, and submit the elements to the foreground. The “trunk” job is skipped if it is declared to be “virtual”; in the above example, the jobs that can be skipped in this way are marked with a “v”. A trunk is declared virtual by an optional rc setting, which is False by default if not defined:
! virtual main job? appl.virtual : True
For each element in the list it is necessary to define the class that should be used to create it. For the “appl.build” job, this could be simply the “UtopyaJobStep” class since no sub-jobs are necessary:
! job step class for this branch: appl.build.class : utopya.UtopyaJobStep
For the “appl.init” job however, sub-jobs are defined. Use for this the “UtopyaJobTree” class again, and define a list with the sub elements:
! UtopyaJobTree class for this branch: appl.init.class : utopya.UtopyaJobTree appl.init.elements : emis obs
The elements are combined with the ‘parent’ elements and form together the full job name, for example “appl.init.emis”.
For all names (non-virtual) in the tree, define the class that should be used to create and submit individual jobs. If the jobs are to be submitted to a queue, specify job options too. Example for the ‘appl.build’ step:
! submit to LSF queue: appl.build.script.class : utopya.JobScriptLSF ! batch options: appl.build.batch.lsf.format : batch.lsf appl.build.batch.lsf.options : name output error appl.build.batch.lsf.option.name : J %(env:name) appl.build.batch.lsf.option.output : oo %(name).out appl.build.batch.lsf.option.error : eo %(name).err
The actual work is again performed by an object derived from the
UtopyaJobTask
class, for which proper initialization arguments should be specified:appl.build.task.class : mymod.MyTask appl.build.task.args : msg='Do something'
While testing the job tree it is sometimes useful to skip a number of sub-jobs. This could be specified by a ‘jump-to’ specification. If this is present for a certain job name, the value should be the name of the next job that should be created. For example, to have build new scripts and executables but skip the initialization steps, include in the settings:
appl.build.jump-to : appl.run
- GetNextElement(element, parfirst=None, indent='')¶
Returns next element from the list to which the requested element belongs. For example, for the name “init.obs” the list of elements is defined by:
init.obs.elements : point sat valid
In this example, the next element after “point” is “sat”.
If the requested element is ‘None’, return the first element (“point”). Otherwise, the requested element should be in the defined list, and either the next element is returned, or ‘None’ if the last element was requested.
- class utopya_jobtree.UtopyaJobParallel(name, rcfile, rcbase='', env={})¶
Bases:
utopya_jobtree.UtopyaJobStep
Class to create and submit a series of job scripts that will run in parallel.
The elements of the series are defined by the following properties:
a generic name that is used to define the rcfile settings;
a format to create an element name (iteration) from an integer step number;
the first and the last ‘step’ number, each step number is one parallel job.
As example, the following settings define a series of 4 elements:
! job: appl.run.class : utopya.UtopyaJobParallel ! generic name for elements: appl.run.generic : part-NN ! formatting rule for actual step names given ! an integer step number; ! syntax should follow str.format() rules ; ! here 2 digits with zero padding: appl.run.step_format : part-{step:0>2} ! initial step numbers: appl.run.step_start : 1 ! maximum possible number for defined format: appl.run.step_max : 4
It is sometimes useful to have a final element to collect the results of the parallel jobs; specify the element name with:
! add final element to collect results: appl.run.final : gather
The job class and other job setting such as the tasks to be performed should be defined using the generic name. The task class might need to now its own step number; it is available as ‘
<name>.__step__
’ in the job environment dictionairy and can be used in the task class arguments. The following example defines jobs that wait for a number of seconds proportional to the step number using theUtopyaJobTaskWait
class:! job: appl.run.part-NN.class : utopya.UtopyaJobStep ! task: appl.run.part-NN.task.class : utopya.UtopyaJobTaskWait appl.run.part-NN.task.args : msg='Perform part of task.', \ nsec=5*env['appl.run.__step__'], \ nsecinfo=1
Only the last parallel job in the series will create the next job, which is the ‘gather’ job if defined or the next job in the tree. Typically the next job peforms two tasks:
wait for all the parallel jobs to be finished using the :py:class`.UtopyaJobParallelWait` class;
gather the output from the parallel jobs if necessary.
The follow exaple illustrates how these tasks could be configured:
! job: appl.run.gather.class : utopya.UtopyaJobStep ! tasks: appl.run.gather.tasks : wait post ! task: appl.run.gather.task.wait.class : utopya.UtopyaJobParallelWait appl.run.gather.task.wait.args : 'appl.run', 'appl.run', '%{rcfile}' ! task: appl.run.gather.task.post.class : myTool.GatherPar appl.run.gather.task.post.args : 'appl.run', ...
- GetNextElement(element, parfirst=False, indent='')¶
Return name of next element.
As example, if the element pased as argument is ‘
part-02
’, then the returned value is ‘part-03
’.If the element passed as argument is ‘
None
’, then the name that corresponds to the first iteration step is returned.If the passed element is the last one, or if the passed element is not
None
but theparfirst
flag is enabled, then ‘None
’ is returned.
- GetStepNumber(element)¶
Extract step number from element name and return as integer.
Current implementation uses the step range setting in the rcfile (from ‘step_start’ to ‘step_max’) to perform a loop over possible step numbers. For each step, the ‘step_format’ is evaluated and compared to the provided element; if a match is found, the step is known. This brute-force test should in future be replaced by a more elegant method reading the number given the format.
- GetVariables(element)¶
Return dictionairy with job variables, for this class the iteration step:
{ '__step__' : 4 }
- AddTasks()¶
Add commands to the job script that submit sub jobs in parallel. The lines contain a ‘
for
’ loop over the step numbers, where in each step theUtopyaJobTaskSubmit
class is used to create and submit a jobfile for that particular step.Only the last parallel job in the list will create the next job in the tree. Typically the next job first waits for all the parallel jobs to be finished using the :py:class`.UtopyaJobParallelWait` class, and then gathers the output from the parallel jobs if necessary.
- AddNextJob(without_next=False)¶
Usually this method adds the lines that create and submit the next job step (if necessary), but for this class the last element of the parallel jobs wil do that. Therefore this method will just insert some comment lines. The arguments are ignored.
- GetParallelJobs()¶
Return info on job names performed in parallel:
jobnames
: list with actual job names;jobname_generic
: generic performed in parallel.
- class utopya_jobtree.UtopyaJobParallelWait(root, root_generic, rcfile, rcbase='', env={}, nsec=1, _indent='')¶
Bases:
utopya_rc.UtopyaRc
Class to wait for all elements of a parallel job to be finished.
This class will use the
class
definition to create a temporary instance of a job, mainly to find the actual work directory of the job. In this work directory a file<name>.pid
is expected to be present, which contains information needed to figure out the current job status; probably this is just an integer number which is the job id.In addition an instance of the
script.class
will be created from which theCheckStatus
method is called with the.pid
file as argument. Depending on the script class, this method will check either the running processes or a batch queue for the job status.Arguments:
root
: name of parallel jobroot_generic
: generic name of parallel jobrcfile
: settings for job tree
Optional arguments:
rcbase
: prefix for rcfile keysenv
: environment dictionairy to expand variables used in rcfilensec
: number of seconds to wait between status checks
- class utopya_jobtree.UtopyaJobIteration(name, rcfile, rcbase='', env={})¶
Bases:
utopya_jobtree.UtopyaJobTree
Class to create and submit a chain of job scripts that are defined as iteration steps.
Example of a tree defined for name ‘appl’:
appl .build .init .emis .obs .point .sat .valid .run .iter-0001 .fwd .dep .grd .opt .run .iter-0002 .fwd .dep .grd .opt : .done
This will create the job chain:
appl.jb appl_build.jb appl_init.jb (v) appl_init_emis.jb appl_init_obs.jb (v) appl_init_obs_point.jb appl_init_obs_sat.jb appl_init_obs_valid.jb appl_run.jb (v) appl_run_iter-0001.jb (v) appl_run_iter-0001_fwd.jb appl_run_iter-0001_dep.jb appl_run_iter-0001_grd.jb appl_run_iter-0001_opt.jb appl_run_iter-0002.jb (v) appl_run_iter-0002_fwd.jb appl_run_iter-0002_dep.jb appl_run_iter-0002_grd.jb appl_run_iter-0002_opt.jb : done.jb
The jobs marked with “(v)” are virtual and actually not created; see the description of virtual jobs.
The iteration list is defined by the following properties:
a generic name used to read settings;
a format to create an element name (iteration) from an integer step number;
the initial step number;
a maximum posible step number, used to perform a loop over possible names.
Example rcfile settings for name ‘appl.run’ that define the iteration steps:
! job iteration class: appl.run.class : utopya.UtopyaJobIteration ! generic for step name used in settings: appl.run.generic : iter-NNNN ! formatting rule for actual step names given ! an integer step number; ! syntax should follow str.format() rules ; ! here 4 digits with zero padding: appl.run.step_format : iter-{step:0>4} ! initial step numbers: appl.run.step_start : 1 ! maximum possible number for defined format: appl.run.step_max : 9999 ! optional step size, default 1: appl.run.step_size : 1
For the above example, each iteration consists of 4 sub-jobs. Define these using the generic name; eventual mark the iteration job as virtual:
! sub list: appl.run.iter-NNNN.class : utopya.UtopyaJobTree appl.run.iter-NNNN.virtual : True appl.run.iter-NNNN.elements : fwd dep grd opt
The method
CheckContinuation()
from the base class is re-implemented to decide if a next iteration step should be performed or that the loop should be terminated, given a step number. For derived classes that implement a new iteration loop, it might be sufficent to only re-define this method too in order to terminate the loop at the right step.- GetStepNumbers(element)¶
Extract step number from element name, and also return maximum number. This method is used by ‘GetNextElement’ and ‘CheckContinuation’ to translate a job name to a number and decide on the next step.
- Return value is a three element tupple with integers:
step,step_max,step_size
Current implementation uses the step range setting in the rcfile (from ‘step_start’ to ‘step_max’) to perform a loop over possible step numbers. For each step, the ‘step_format’ is evaluated and compared to the provided element; if a match is found, the step is known. This brute-force test should in future be replaced by a more elegant method reading the number given the format.
- GetVariables(element)¶
Return dictionairy with job variables, for this class the iteration step:
{ '__step__' : 4 }
- GetNextElement(element, indent='', parfirst=None)¶
Return name of next element (if present).
As example, if the element pased as argument is ‘
iter-0004
’, then the returned value is ‘iter-0005
’.If the element passed as argument is ‘
None
’, then the name that corresponds to the first iteration step is returned.
- CheckContinuation(element)¶
Check iteration status given job name. The job name could be used to derive the iteration step number to decide whether a maximum is reached, or it could be used to read output files to decide whether convergence is reached.
- Returns two str values:
order : one of ‘continue’, ‘finish’, ‘error’
msg : informative message to explain the order
In this implementation, the loop is finished if the maximum step number is reached, defined by the ‘step_max’ value in the settings. The current step number and the maximum are obtained from a call to the
GetStepNumbers()
method.
- class utopya_jobtree.UtopyaJobIteration_CheckFile(name, rcfile, rcbase='', env={})¶
Bases:
utopya_jobtree.UtopyaJobIteration
UTOPyA JobIteration class with the
CheckContinuation()
method defined to read instructions from an input file.- CheckContinuation(element)¶
Check iteration status by reading a text file named:
<name>.<element>.msg
The text file should consist of 2 lines that are read and provided as return values:
order : one of ‘continue’, ‘finish’, ‘error’
msg : informative message to explain the order
- class utopya_jobtree.UtopyaJobTask(msg=None)¶
Bases:
object
Dummy class for illustration of the
UtopyaJobStep
class and its derivatives.In this implementation, an optional str message could be passed on initialization, which is printed if pressent.
- class utopya_jobtree.UtopyaJobTaskSubmit(name, rcfile, rcbase='', env={}, name_generic=None, without_next=False, msg=None, _indent='')¶
Bases:
utopya_jobtree.UtopyaJobTask
Job task to create and submit a job(tree).
Arguments:
name
: name of the job(tree) settingsrcfile
: settings file
Optional arguments:
rcbase
,env
: initialization arguments for rcfilename_generic
: generic name for settings, for example without iteration number expandedwithout_next
: if True, do not submit next step in jobtree (used for testing)msg
: informative message passed to base class (used for testing)
For example, with
name
equal to ‘appl’ the first lines of the setting could be:! single job: appl.class : utopya.UtopyaJobStep ! task: appl.task.class : utopya.UtopyaJobTask appl.task.args : msg='Perform application task.'
An instance of the from
UtopyaJobStep
derived class will be created, and after initialization, theRun
method of the instance is called.
- class utopya_jobtree.UtopyaJobTaskRun(command='')¶
Bases:
utopya_jobtree.UtopyaJobTask
Job task to run executable.
The argument specifies the command:
'appl.x --flag=1 input.txt'
- class utopya_jobtree.UtopyaJobTaskWait(nsec=5, nsecinfo=None, msg=None)¶
Bases:
utopya_jobtree.UtopyaJobTask
Job task to wait for a while, sometimes useful for testing.
Arguments:
n
: number of seconds to waitnsecinfo
: show message after specified number of secondsmsg
: informative message
- class utopya_jobtree.UtopyaJobTreeTiming(name, rcfile, rcbase='', env={}, html=False)¶
Bases:
utopya_rc.UtopyaRc
Collect timing profiles written by jobs in a job tree, and create an overall timing profile.
Arguments:
name
: base name of job treercfile
: settings for jobtree with specified base name
Optional arguments:
rcbase
,env
: used for initialization of rcfile configurationhtml
: if enabled, create html index page with bar plots of timing