Jobs are submitted with the
The options tell Torque information about the job, such as what resources will be needed. These can be specified in the job-script as PBS directives, or on the command line as options, or both (in which case the command line options take precedence should the two contradict each other). For each option there is a corresponding PBS directive with the syntax:
For example, you can specify that a job needs 2 nodes and 8 cores on each node by adding to the script the directive:
or as a command-line option to
qsub when you submit the job:
Running many small tasks with Job Arrays or pbsdsh
Options for many similar jobs (array jobs and pbsdsh):
Submit an array of jobs with array ids as specified. Array ids can be specified as a numerical range, a comma-separated list of numbers, or as some combination of the two. Each job instance will have an environment variable
As above, but the appended '
%n' specifies the maximum number of array items (in this case, 5) which should be running at one time
- Submit a single "shepherd" job requesting multiple processes and from it start individual jobs with
The naive approach to running a large set of jobs based on the same script is to repeatedly
qsub the script at the command line, perhaps changing a few environment variables, directories or input files each time.
A slightly less naive approach is to parameterize the script with some variables and
qsub it in a shell loop.
Torque offers two methods, both more elegant than either of the above, for managing such workflows:
- A job array groups a set of jobs under the same
$PBS_JOBID, each with a unique
$PBS_ARRAYID. Batch system commands such as
qdel, etc can be called on individual jobs or on the job array as a whole.
- If the individual jobs are small, the queuing overhead is relatively large. In this circumstance it is better to launch a single parallel job which uses pbsdsh to run the set of small jobs.
Using pbsdsh for many small jobs
(Still to come. A solution for when you have a huge number of jobs each needing only a few minutes, however it performs badly if the run times of the jobs are not uniform)