Child pages
  • Tutorial - Submitting a job using sbatch

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Section
Column
width200px
Include Page
usenav
usenav
Column

sbatch Tutorial

  1. Synopsis
  2. What is qsub
  3. What does qsub do?
  4. Arguments to control behavior

Anchor
synopsis
synopsis

Synopsis

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleqsub Synopsis
collapsefalse
qsub
[-a date_time]
[-A account_string]
[-b secs]
[-c checkpoint_options]
              n  No checkpointing is to be performed.
              s  Checkpointing is to be performed only when the server executing the job is shutdown.
              c  Checkpointing is to be performed at the default minimum time for the server executing
                 the job.
              c=minutes
                 Checkpointing is to be performed at an interval of minutes, which is the integer number
                 of minutes of CPU time used by the job. This value must be greater than zero.
[-C directive_prefix] [-d path] [-D path] [-e path] [-f] [-h]
[-I ]
[-j join ]
[-k keep ]
[-l resource_list ]
[-m mail_options]
[-M user_list]
[-N name]
[-o path]
[-p priority]
[-P user[:group]]
[-q destination]
[-r c]
[-S path_list]
[-t array_request]
[-u user_list]
[-v variable_list]
[-V ]
[-W additional_attributes]
[-X]
[-z]
[script]

For detailed information, see this page.

Anchor
what_is_qsub
what_is_qsub

What is qsub?

qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below.

Info
titleUseful Information

For more information on qsub do

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleMore information on qsub
collapsefalse
$ man qsub

Anchor
what_qsub_do
what_qsub_do

What does qsub do?

Anchor
overview
overview

Overview

All of our clusters have a batch server referred to as the cluster management server running on the headnode. This batch server monitors the status of the cluster and controls/monitors the various queues and job lists. Tied into the batch server, a scheduler makes decisions about how a job should be run and its placement in the queue. qsub interfaces into the the batch server and lets it know that there is another job that has requested resources on the cluster. Once a job has been received by the batch server, the scheduler decides the placement and notifies the batch server which in turn notifies qsub (Torque/PBS) whether the job can be run or not. The current status (whether the job was successfully scheduled or not) is then returned to the user. You may use a command file or STDIN as input for qsub.

Anchor
env_variables
env_variables

Environment variables in qsub

The qsub command will pass certain environment variables in the Variable_List attribute of the job. These variables will be available to the job. The value for the following variables will be taken from the environment of the qsub command:

  • HOME (the path to your home directory)
  • LANG (which language you are using)
  • LOGNAME (the name that you logged in with)
  • PATH (standard path to excecutables)
  • MAIL (location of the users mail file)
  • SHELL (command shell, i.e bash,sh,zsh,csh, ect.)
  • TZ (time zone)

These values will be assigned to a new name which is the current name prefixed with the string "PBS_O_". For example, the job will have access to an environment variable named PBS_O_HOME which have the value of the variable HOME in the qsub command environment.

In addition to these standard environment variables, there are additional environment variables available to the job.

  • PBS_O_HOST (the name of the host upon which the qsub command is running)
  • PBS_SERVER (the hostname of the pbs_server which qsub submits the job to)
  • PBS_O_QUEUE (the name of the original queue to which the job was submitted)
  • PBS_O_WORKDIR (the absolute path of the current working directory of the qsub command)
  • PBS_ARRAYID (each member of a job array is assigned a unique identifier)
  • PBS_ENVIRONMENT (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job)
  • PBS_JOBID (the job identifier assigned to the job by the batch system)
  • PBS_JOBNAME (the job name supplied by the user)
  • PBS_NODEFILE (the name of the file contain the list of nodes assigned to the job)
  • PBS_QUEUE (the name of the queue from which the job was executed from)
  • PBS_WALLTIME (the walltime requested by the user or default walltime allotted by the scheduler)

Anchor
arguments
arguments

Arguments to control behavior

As stated before there are several arguments that you can use to get your jobs to behave a specific way. This is not an exhaustive list, but some of the most widely used and many that you will will probably need to accomplish specific tasks.

Anchor
date_time
date_time

Declare the date/time a job becomes eligible for execution

To set the date/time which a job becomes eligible to run, use the --begin argument. If --begin is not specified, sbatch assumes that the job should be run immediately.

Example

1) Defer job until 5 minutes later.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleExample: Set the date/time which a job becomes eligible to run
collapsefalse
#SBATCH --begin=now+5minute

2) Set a specific date/time to run the job. Defer job until 16:10:00 (HH:MM:SS MM/DD/YY).

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleExample: Set the date/time which a job becomes eligible to run
collapsefalse
#SBATCH --begin=16:10:00

Anchor
work_dir
work_dir

Defining the working directory path to be used for the job

To define the working directory path to be used for the job --workdir option can be used. If it is not specified, the default working directory is the directory where sbatch command is executed.

Example
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
#SBATCH --workdir="bashdir"
Or
#SBATCH --workdir="/home/user_account/workdir"

 

 

 

Anchor
manipulate
manipulate

Manipulate the output files

As a default all jobs will print all stdout (To write standard output ) messages to a file with the name in the format <job_name>.o<job_id> and all stderr (standard error) messages will be sent to a file named <job_name>.e<job_id>. These files will be copied to your working directory as soon as the job starts. To rename the file or specify a different location for the standard output and error files, use the -o for standard output and -e for the standard error file. You can also combine the output using -j.

Example

, specify --output option of sbatch. To write standard error to a file, specify --error option.

Example

slurm_%j.out is the filename, where "%j" is replaced by the job ID.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleCreate a simple submission file:
collapsefalse
$ cat sleep.pbs
#!/bin/sh

for i in {1..60} ; do
       echo $i
       sleep 1
done
#SBATCH --output=slurm_%j.out
#PBS
-
o sleep.log
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleCreate a simple submission file:
collapsefalse
$#SBATCH qsub -o sleep.log sleep.pbs
Tip
titleHandy Hint

This option can be added to pbs script with a PBS directive such as

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleEquivalent PBS Directive
collapsefalse
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmit your job with the standard error file renamed:
collapsefalse
$ qsub -e sleep.log sleep.pbs
Tip
titleHandy Hint

This option can be added to pbs script with a PBS directive such as

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleEquivalent PBS Directive
collapsefalse
#PBS -e sleep.log
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleCombine them using the name sleep.log:
collapsefalse
$ qsub -o sleep.log -j oe sleep.pbs
Tip
titleHandy Hint

This option can be added to pbs script with a PBS directive such as

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleEquivalent PBS Directive
collapsefalse
#PBS -o sleep.log
#PBS -j oe
Warning
titleWarning

The order of two letters next to flag -j is important. It should always start with the letter that's been already defined before, in this case 'o'.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlePlace the joined output in another location other than the working directory:
collapsefalse
$ qsub -o $HOME/tutorials/logs/sleep.log -j oe sleep.pbs
error=slurm_%j.out


Anchor
mail_job_status
mail_job_status

Mail job status at the start and end of a job

The mailing options are set using the -m and -M arguments. The -m argument sets the conditions under which the batch server will send a mail message about the job and -M will define the users that emails will be sent to (multiple users can be specified in a list seperated by commas). The conditions for the -m argument include:

  • a: mail is sent when the job is aborted.
  • b: mail is sent when the job begins.
  • e: main is sent when the job ends.
Example
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleUsing the sleep.pbs script created earlier, submit a job that emails you for all conditions:
collapsefalse
$ qsub -m abe -M NetID@nyu.edu sleep.pbs
Tip
titleHandy Hint

This option can be added to pbs script with a PBS directive such as

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleEquivalent PBS Directive
collapsefalse
#PBS -m abe
#PBs -M NetID@nyu.edu

Anchor
specific_queue
specific_queue

Submit a job to a specific queue

You can select a queue based on walltime needed for your job. Use the 'qstat -q' command to see the maximum job times for each queue.

Example
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmit a job to the bigmem queue:
collapsefalse
$ qsub -q bigmem sleep.pbs
Tip
titleHandy Hint

This option can be added to pbs script with a PBS directive such as

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleEquivalent PBS Directive
collapsefalse
#PBS -q bigmem

Anchor
dependency
dependency

Submitting a job that is dependent on the output of another

Often you will have jobs that will be dependent on another for output in order to run. To add a dependency, we will need to use the -W (additional attributes) with the depend option. We will be using the afterok rule, but there are several other rules that may be useful. (man qsub)

Example

To illustrate the ability to hold execution of a specific job until another has completed, we will write two submission scripts. The first will create a list of random numbers. The second will sort those numbers. Since the second script will depend on the list that is created we will need to hold execution until the first has finished.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlerandom.pbs
collapsefalse
$ cat random.pbs
#!/bin/sh
cd $HOME
sleep 120
for i in {1..100}; do
     echo $RANDOM >> rand.list
done
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlesort.pbs
collapsefalse
$ cat sort.pbs
#!/bin/sh
cd $HOME
sort -n rand.list > sorted.list
sleep 30

Once the file are created, lets see what happens when they are submitted at the same time:

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmit at the same time
collapsefalse
$ qsub random.pbs ; qsub sort.pbs
5594670.hpc0.local
5594671.hpc0.local
$ ls
random.pbs  sorted.list  sort.pbs  sort.pbs.e5594671  sort.pbs.o5594671
$ cat sort.pbs.e5594671
sort: open failed: rand.list: No such file or directory

Since they both ran at the same time, the sort script failed because the file rand.list had not been created yet. Now submit them with the dependencies added.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmit them with the dependencies added
collapsefalse
$ qsub random.pbs
5594674.hpc0.local
$ qsub -W depend=afterok:5594674.hpc0.local sort.pbs
5594675.hpc0.local
$ qstat -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5594674.hpc0.loc     manchu   ser2     random.pbs        18029     1   1    --  48:00 R 00:00
5594675.hpc0.loc     manchu   ser2     sort.pbs                    1   1    --  48:00 H   --

We now see that the sort.pbs job is in a hold state. And once the dependent job completes the sort job runs and we see:

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleJob status with the dependencies added
collapsefalse
$ qstat -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5594675.hpc0.loc     manchu   ser2     sort.pbs          18165     1   1    --  48:00 R   --
Info
titleUseful Information
  • afterany:jobid[:jobid...] implies that job may be scheduled for execution after jobs jobid have terminated, with or without errors.
  • afterok:jobid[:jobid...] implies that job may be scheduled for execution only after jobs jobid have terminated with no errors.
  • afternotok:jobid[:jobid...] implies that job may be scheduled for execution only after jobs jobid have terminated with errors.

Anchor
dependency_loop
dependency_loop

Submitting multiple jobs in a loop that depend on output of another job

This example show how to submit multiple jobs in a loop where each job depends on output of job submitted before it.

Example

Let's say we need to write numbers from 0 to 999999 in order onto a file output.txt. We can do 10 separate runs to achieve this, where each run has a separate pbs script writing 100,000 numbers to output file. Let's see what happens if we submit all 10 jobs at the same time.

The script below creates required pbs scripts for all the runs.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleCreate PBS Scripts for all the runs
collapsefalse
$ cat creation.sh
#!/bin/bash

for i in {0..9}
do
    cat > pbs.script.$i << EOF
#!/bin/bash

#PBS -l nodes=1:ppn=1,walltime=600

cd \$PBS_O_WORKDIR

for ((i=$((i*100000)); i<$(((i+1)*100000)); i++))
 {
    echo "\$i" >> output.txt
 }

exit 0;

EOF
done
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleChange permission to make it an executable
collapsefalse
$ chmod u+x creation.sh
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleRun the Script
collapsefalse
$ ./creation.sh
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleList of Created PBS Scripts
collapsefalse
$ ls -l pbs.script.*
-rw-r--r-- 1 manchu wheel 134 Oct 27 16:32 pbs.script.0
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.1
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.2
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.3
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.4
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.5
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.6
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.7
-rw-r--r-- 1 manchu wheel 139 Oct 27 16:32 pbs.script.8
-rw-r--r-- 1 manchu wheel 140 Oct 27 16:32 pbs.script.9
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlePBS Script
collapsefalse
$ cat pbs.script.0
#!/bin/bash

#PBS -l nodes=1:ppn=1,walltime=600

cd $PBS_O_WORKDIR

for ((i=0; i<100000; i++))
 {
    echo "$i" >> output.txt
 }

exit 0;
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmit Multiple Jobs at a Time
collapsefalse
$ for i in {0..9}; do qsub pbs.script.$i ; done
5633531.hpc0.local
5633532.hpc0.local
5633533.hpc0.local
5633534.hpc0.local
5633535.hpc0.local
5633536.hpc0.local
5633537.hpc0.local
5633538.hpc0.local
5633539.hpc0.local
5633540.hpc0.local
$
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleoutput.txt
collapsefalse
$ tail output.txt
699990
699991
699992
699993
699994
699995
699996
699997
699998
699999
-bash-3.1$ grep -n 999999 $_
210510:999999
$

This clearly shows the nubmers are in no order like we wanted. This is because all the runs wrote to the same file at the same time, which is not what we wanted.

Let's submit jobs using qsub dependency feature. This can be achieved with a simple script shown below.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSimple Script to Submit Multiple Dependent Jobs
collapsefalse
$ cat dependency.pbs
#!/bin/bash

job=`qsub pbs.script.0`
for i in {1..9}
do
    job_next=`qsub -W depend=afterok:$job pbs.script.$i`
    job=$job_next
done
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleLet's make it an executable
collapsefalse
$ chmod u+x dependency.pbs
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmit dependent jobs by running the script
collapsefalse
$ ./dependency.pbs
$ qstat -u manchu

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5633541.hpc0.loc     manchu   ser2     pbs.script.0      28646     1   1    --  00:10 R   --
5633542.hpc0.loc     manchu   ser2     pbs.script.1        --      1   1    --  00:10 H   --
5633543.hpc0.loc     manchu   ser2     pbs.script.2        --      1   1    --  00:10 H   --
5633544.hpc0.loc     manchu   ser2     pbs.script.3        --      1   1    --  00:10 H   --
5633545.hpc0.loc     manchu   ser2     pbs.script.4        --      1   1    --  00:10 H   --
5633546.hpc0.loc     manchu   ser2     pbs.script.5        --      1   1    --  00:10 H   --
5633547.hpc0.loc     manchu   ser2     pbs.script.6        --      1   1    --  00:10 H   --
5633548.hpc0.loc     manchu   ser2     pbs.script.7        --      1   1    --  00:10 H   --
5633549.hpc0.loc     manchu   ser2     pbs.script.8        --      1   1    --  00:10 H   --
5633550.hpc0.loc     manchu   ser2     pbs.script.9        --      1   1    --  00:10 H   --
$
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleOutput after first run
collapsefalse
$ tail output.txt
99990
99991
99992
99993
99994
99995
99996
99997
99998
99999
$
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleOutput after final run
collapsefalse
$ tail output.txt
999990
999991
999992
999993
999994
999995
999996
999997
999998
999999
$ grep -n 100000 output.txt
100001:100000
$ grep -n 999999 output.txt
1000000:999999
$

This shows that numbers are written in order to output.txt. Which in turn shows that jobs ran one after successful completion of another.

Anchor
interactive
interactive

Opening an interactive shell to the compute node

To open an interactive shell to a compute node, use the -I argument. This is often used in conjunction with the -X (X11 Forwarding) and the -V (pass all of the users environment)

Example
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleOpen an interactive shell to a compute node
collapsefalse
$ qsub -I

Anchor
pass_env_var
pass_env_var

Passing an environment variable to your job

You can pass user defined environment variables to a job by using the -v argument.

Example

To test this we will use a simple script that prints out an environment variable.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlePassing an environment variable
collapsefalse
$ cat variable.pbs
#!/bin/sh
if [ "x" == "x$MYVAR" ] ; then
    echo "Variable is not set"
else
    echo "Variable says: $MYVAR"
fi

Next use qsub without the -v and check your standard out file

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleqsub without -v
collapsefalse
$ qsub variable.pbs
5596675.hpc0.local
$ cat variable.pbs.o5596675
Variable is not set

Then use the -v to set the variable

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleqsub with -v
collapsefalse
$ qsub -v MYVAR="hello" variable.pbs
5596676.hpc0.local
$ cat variable.pbs.o5596676
Variable says: hello
Tip
titleHandy Hint

This option can be added to pbs script with a PBS directive such as

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleEquivalent PBS Directive
collapsefalse
#PBS -v MYVAR="hello"
Info
titleUseful Information

Multiple user defined environment variables can be passed to a job at a time.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlePassing Multiple Variables
collapsefalse
$ cat variable.pbs
#!/bin/sh
echo "$VAR1 $VAR2 $VAR3" > output.txt
$
$ qsub -v VAR1="hello",VAR2="Sreedhar",VAR3="How are you?" variable.pbs
5627200.hpc0.local
$ cat output.txt
hello Sreedhar How are you?
$

Anchor
pass_env
pass_env

Passing your environment to your job

You may declare that all of your environment variables are passed to the job by using the -V argument in qsub.

Example

Use qsub to perform an interactive login to one of the nodes:

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlePassing your environment: qsub with -V
collapsefalse
$ qsub -I -V
Tip
titleHandy Hint

This option can be added to pbs script with a PBS directive such as

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleEquivalent PBS Directive
collapsefalse
#PBS -V

Once the shell is opened, use the env command to see that your environment was passed to the job correctly. You should still have access to all your modules that you loaded previously.

Anchor
array_job
array_job

Submitting an array job: Managing groups of jobs

Sometimes users will want to submit large numbers of jobs based on the same job script. Rather than using a script to repeatedly call qsub, a feature known as job arrays exists to allow the creation of multiple jobs with one qsub command. Additionally, this feature includes a new job naming convention that allows users to reference the entire set of jobs as a unit, or to reference one particular job from the set. Each job submitted will have a job id in the format <id>[<num>].hostname. In the case of a submission number of 5554444, each 5554444[x] job has an environment variable called PBS_ARRAYID, which is set to the value of the array index of the job, so 55544440.hostname would have PBS_ARRAYID set to 0. This will allow you to create job arrays where each job in the array will perform slightly different actions based on the value of this variable, such as performing the same tasks on different input files. One other difference in the environment between jobs in the same array is the value of the PBS_JOBNAME variable.

Anchor
example
example

Example

First we need to create data to be read. Note that in a real application, this could be data, configuration setting or anything that your program needs to run.

Anchor
input_data
input_data

Create Input Data

To create input data, run this simple one-liner:

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleCreating input data
collapsefalse
$ for i in {0..4}; do echo "Input data file for an array $i" > input.$i ; done

$ ls input.*
input.0  input.1  input.2  input.3  input.4

$ cat input.0
Input data file for an array 0

Anchor
sub_script
sub_script

Submission Script
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmission Script: array.pbs
collapsefalse
$ cat array.pbs
#!/bin/sh

#PBS -l nodes=1:ppn=1,walltime=5:00
#PBS -N arraytest

cd ${PBS_O_WORKDIR}    # Take me to the directory where I launched qsub
# This part of the script handles the data. In a real world situation you will probably
# be using an existing application.
cat input.${PBS_ARRAYID} > output.${PBS_ARRAYID}
echo "Job Name is ${PBS_JOBNAME}" >> output.${PBS_ARRAYID}
sleep 30

exit 0;

Anchor
sub_mon
sub_mon

Submit & Monitor

Instead of running five qsub commands, we can simply enter:

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleSubmitting and Monitoring Array of Jobs
collapsefalse
$ qsub -t 0-4 array.pbs
5534017[].hpc0.local

Anchor
qstat
qstat

qstat
Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleqstat
collapsefalse
$ qstat -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534017[].hpc0.l     sm4082   ser2     arraytest                1   1    --  00:05 R   --

$ qstat -t -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534017[0].hpc0.     sm4082   ser2     arraytest-0       12017     1   1    --  00:05 R   --
5534017[1].hpc0.     sm4082   ser2     arraytest-1       12050     1   1    --  00:05 R   --
5534017[2].hpc0.     sm4082   ser2     arraytest-2       12084     1   1    --  00:05 R   --
5534017[3].hpc0.     sm4082   ser2     arraytest-3       12117     1   1    --  00:05 R   --
5534017[4].hpc0.     sm4082   ser2     arraytest-4       12150     1   1    --  00:05 R   --

$ ls output.*
output.0  output.1  output.2  output.3    output.4

$ cat output.0
Input data file for an array 0
Job Name is arraytest-0

Anchor
pbstop
pbstop

pbstop

pbstop by default doesn't show all the jobs in the array. Instead, it shows a single job in just one line in the job information. Pressing 'A' shows all the jobs in the array. Same can be achieved by giving the command line option '-A'. This option along with '-u <NetID>' shows all of your jobs including array as well as normal jobs.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titlepbstop
collapsefalse
$ pbstop -A -u $USER
Note
titleNote

Typing 'A' expands/collapses array job representation.

Anchor
comma_lists
comma_lists

Comma delimited lists

The -t option of qsub also accepts comma delimited lists of job IDs so you are free to choose how to index the members of your job array. For example:

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleComma delimited lists
collapsefalse
$ rm output.*

$ qsub -t 2,5,7-9 array.pbs
5534018[].hpc0.local

$ qstat -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534018[].hpc0.l     sm4082   ser2     arraytest                 1   1    --  00:05 Q   --

$ qstat -t -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534018[2].hpc0.     sm4082   ser2     arraytest-2       12319     1   1    --  00:05 R   --
5534018[5].hpc0.     sm4082   ser2     arraytest-5       12353     1   1    --  00:05 R   --
5534018[7].hpc0.     sm4082   ser2     arraytest-7       12386     1   1    --  00:05 R   --
5534018[8].hpc0.     sm4082   ser2     arraytest-8       12419     1   1    --  00:05 R   --
5534018[9].hpc0.     sm4082   ser2     arraytest-9       12452     1   1    --  00:05 R   --

$ ls output.*
output.2  output.5  output.7  output.8    output.9

$ cat output.2
Input data file for an array 2
Job Name is arraytest-2

Anchor
arrays_step
arrays_step

A more general for loop - Arrays with step size

By default, PBS doesn't allow array jobs with step size. qsub -t 0-10 <pbs.script> increments PBS_ARRAYID in 1. To submit jobs in steps of a certain size, let's say step size of 3 starting at 0 and ending at 10, one has to do

Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
qsub -t 0,3,6,9 <pbs.script>

To make it easy for users we have put a wrapper which takes starting point, ending point and step size as arguments for -t flag. This avoids default necessity that PBS_ARRAYID increment be 1. The above request can be accomplished with (which happens behind the scenes with the help of wrapper)

Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
qsub -t 0-10:3 <pbs.script>

Here, 0 is the starting point, 10 is the ending point and 3 is the step size. It is not necessary that starting point must be 0. It can be any number. Incidentally, in a situation in which the upper-bound is not equal to the lower-bound plus an integer-multiple of the increment, for example

Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
qsub -t 0-10:3 <pbs.script>

wrapper automatically changes the upper bound as shown in the example below.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleArrays with step size
collapsefalse
[sm4082@login-0-0 ~]$ qsub -t 0-10:3 array.pbs
6390152[].hpc0.local

[sm4082@login-0-0 ~]$ qstat -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
6390152[].hpc0.l     sm4082   ser2     arraytest           --      1   1    --  00:05 Q   --
[sm4082@login-0-0 ~]$ qstat -t -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
6390152[0].hpc0.     sm4082   ser2     arraytest-0       25585     1   1    --  00:05 R   --
6390152[3].hpc0.     sm4082   ser2     arraytest-3       28227     1   1    --  00:05 R   --
6390152[6].hpc0.     sm4082   ser2     arraytest-6        8515     1   1    --  00:05 R 00:00
6390152[9].hpc0.     sm4082   ser2     arraytest-9         505     1   1    --  00:05 R   --

[sm4082@login-0-0 ~]$ ls output.*
output.0  output.3  output.6  output.9

[sm4082@login-0-0 ~]$ cat output.9
Input data file for an array 9
Job Name is arraytest-9
[sm4082@login-0-0 ~]$
Note
titleNote

By default, PBS doesn't support arrays with step size. On our clusters, it's been achieved with a wrapper. This option might not be there on clusters at other organizations/schools that use PBS/Torque.

Anchor
delete
delete

Note
titleNote

If you're trying to submit jobs through ssh to login nodes from your pbs scripts with statement such as

Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
ssh login-0-0 "cd ${PBS_O_WORKDIR};`which qsub` -t 0-10:3 <pbs.script>"

arrays with step size wouldn't work unless you either add

Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
shopt -s expand_aliases

to your pbs script that's in bash or add this to your .bashrc in your home directory. Adding this makes alias for qsub come into effect there by making wrapper act on command line options to qsub (For that matter this brings any alias to effect for commands executed via SSH).

If you have

Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
#PBS -t 0-10:3

in your pbs script you don't need to add this either to your pbs script or to your .bashrc in your home directory.

Anchor
list_input
list_input

A List of Input Files/Pulling data from the ith line of a file

Suppose we have a list of 1000 input files, rather than input files explicitly indexed by suffix, in a file file_list.text one per line:

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleA List of Input Files/Pulling data from the ith line of a file
collapsefalse
[sm4082@login-0-2 ~]$ cat array.list
#!/bin/bash

#PBS -S /bin/bash
#PBS -l nodes=1:ppn=1,walltime=1:00:00

  INPUT_FILE=`awk "NR==$PBS_ARRAYID" file_list.text`
      #
      # ...or use sed:
      #        sed -n -e "${PBS_ARRAYID}p" file_list.text
      #
      # ...or use head/tail
      #        $(cat file_list.text | head -n $PBS_ARRAYID | tail -n 1)

  ./executable < $INPUT_FILE

In this example, the '-n' option suppresses all output except that which is explicitly printed (on the line equal to PBS_ARRAYID).

Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
qsub -t 1-1000 array.list

Let’s say you have a list of 1000 numbers in a file, one number per line. For example, the numbers could be random number seeds for a simulation. For each task in an array job, you want to get the ith line from the file, where i equals PBS_ARRAYID, and use that value as the seed. This is accomplished by using the Unix head and tail commands or awk or sed just like above.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleA List of Input Files/Pulling data from the ith line of a file
collapsefalse
[sm4082@login-0-2 ~]$ cat array.seed
#!/bin/bash

#PBS -S /bin/bash
#PBS -l nodes=1:ppn=1,walltime=1:00:00

SEEDFILE=~/data/seeds
SEED=$(cat $SEEDFILE | head -n $PBS_ARRAYID | tail -n 1)
~/programs/executable $SEED > ~/results/output.$PBS_ARRAYID
Code Block
borderColorblack
bgColorgrey
borderWidth0
languagebash
themeConfluence
borderStylesolid
collapsefalse
qsub -t 1-1000 array.seed

You can use this trick for all sorts of things. For example, if your jobs all use the same program, but with very different command-line options, you can list all the options in the file, one set per line, and the exercise is basically the same as the above, and you only have two files to handle (or 3, if you have a perl script generate the file of command-lines).

Delete

Anchor
del_all
del_all

Delete all jobs in array

We can delete all the jobs in array with a single command.

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleDeleting array of jobs
collapsefalse
$ qsub -t 2-5 array.pbs
5534020[].hpc0.local

$ qstat -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534020[].hpc0.l     sm4082   ser2     arraytest                1     1    --   00:05 R  --

$ qdel 5534020[]

$ qstat -u $USER
$

Anchor
del_single
del_single

Delete a single job in array

Delete a single job in array, e.g. number 4,5 and 7

Code Block
borderColorblack
bgColorgrey
titleColorblack
borderWidth1
titleBGColor#F7D6C1
languagebash
themeConfluence
borderStylesolid
titleDeleting a single job in array
collapsefalse
$ qsub -t 0-8 array.pbs
5534021[].hpc0.local

$ qstat -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
----------- -- ---- ---------- ---- ---- -- ----- --- - ---
5534021[].hpc0.l     sm4082   ser2     arraytest                 1   1    --  00:05 Q   --

$ qstat -t -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534021[0].hpc0.     sm4082   ser2     arraytest-0       26618     1   1    --  00:05 R   --
5534021[1].hpc0.     sm4082   ser2     arraytest-1       14271     1   1    --  00:05 R   --
5534021[2].hpc0.     sm4082   ser2     arraytest-2       14304     1   1    --  00:05 R   --
5534021[3].hpc0.     sm4082   ser2     arraytest-3       14721     1   1    --  00:05 R   --
5534021[4].hpc0.     sm4082   ser2     arraytest-4       14754     1   1    --  00:05 R   --
5534021[5].hpc0.     sm4082   ser2     arraytest-5       14787     1   1    --  00:05 R   --
5534021[6].hpc0.     sm4082   ser2     arraytest-6       10711     1   1    --  00:05 R   --
5534021[7].hpc0.     sm4082   ser2     arraytest-7       10744     1   1    --  00:05 R   --
5534021[8].hpc0.     sm4082   ser2     arraytest-8        9711     1   1    --  00:05 R   --

$ qdel 5534021[4]
$ qdel 5534021[5]
$ qdel 5534021[7]

$ qstat -t -u $USER

hpc0.local:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
5534021[0].hpc0.     sm4082   ser2     arraytest-0       26618     1   1    --  00:05 R   --
5534021[1].hpc0.     sm4082   ser2     arraytest-1       14271     1   1    --  00:05 R   --
5534021[2].hpc0.     sm4082   ser2     arraytest-2       14304     1   1    --  00:05 R   --
5534021[3].hpc0.     sm4082   ser2     arraytest-3       14721     1   1    --  00:05 R   --
5534021[6].hpc0.     sm4082   ser2     arraytest-6       10711     1   1    --  00:05 R   --
5534021[8].hpc0.     sm4082   ser2     arraytest-8        9711     1   1    --  00:05 R   --

$ qstat -t -u $USER
$
HTML
<TABLE style="margin:auto">
	<TBODY>

                <TR>
                    &nbsp;&nbsp;&nbsp;
                </TR>
                <TR>

                                                        <TD style="border:0">

<script type="text/javascript">
var sc_project=7651388;
var sc_invisible=0;
var sc_security="bd26bd01";
</script>
<script type="text/javascript"
src="http://www.statcounter.com/counter/counter.js"></script>
<noscript><div class="statcounter"><a title="hit counter
for tumblr" href="http://statcounter.com/tumblr/"
target="_blank"><img class="statcounter"
src="http://c.statcounter.com/7651388/0/bd26bd01/0/"
alt="hit counter for tumblr"></a></div></noscript>
</P>
</TD>

						</TR>


					</TBODY></TABLE>

 

 

Column
width200px
Include Page
rightcol
rightcol