- What is sbatch
- What does sbatch do?
- Arguments to control behavior
- Declare the date/time a job becomes eligible for execution
- Defining the working directory path to be used for the job
- Manipulate the output files
- Mail job status at the start and end of a job
- Submit a job to a specific queue
- Submitting a job that is dependent on the output of another
- Submitting multiple jobs in a loop that depend on output of another job
- Opening an interactive shell to the compute node
- Passing an environment variable to your job
- Submitting an array job: Managing groups of jobs
Request that a minimum of minnodes nodes be allocated to this job.
Request that ntasks be invoked on each node.
Specify the real memory required per node.
--mail-type=<type> Notify user by email when certain event types occur.
|--mail-user=<user>||User to receive email notification of state changes as defined by --mail-type. The default value is the submitting user.|
|-o, --output=<filename pattern>||Instruct Slurm to connect the batch script's standard output directly to the file name specified in the "filename pattern".|
What is sbatch?
sbatch submits a batch script to Slurm.
For more information on sbatch do
What does sbatch do?
Overviewsbatch submits a batch script to Slurm which allocates nodes to the users. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script.sbatch exits immediately after the script is successfully transferred to the Slurm controller and assigned a Slurm job ID. The batch script is not necessarily granted resources immediately, it may sit in the queue of pending jobs for some time before its required resources become available.By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job allocation number. The file will be generated on the first node of the job allocation. Other than the batch script itself, Slurm does no movement of user files.When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes.
Environment variables in sbatch
INPUT ENVIRONMENT VARIABLES
Upon startup, sbatch will read and handle the options set in the following environment variables. Note that environment variables will override any options set in a batch script, and command line options will override any environment variables.
Same as -A, --account
Same as --acctg-freq
Same as -a, --array
Same as --blrts-image
Same as --checkpoint
Same as --checkpoint-dir
SBATCH_CLUSTERS or SLURM_CLUSTERS
Same as --clusters
Same as --cnload-image
Same as --conn-type
Same as -C, --constraint
Same as --core-spec
Same as -v, --verbose
Same as --delay-boot
Same as -m, --distribution
OUTPUT ENVIRONMENT VARIABLES
The Slurm controller will set the following variables in the environment of the batch script.
The reservation ID on Cray systems running ALPS/BASIL only.
Do not allocate a block on Blue Gene L/P systems only.
Do not free a block on Blue Gene L/P systems only.
The block name on Blue Gene systems only.
Set to value of the --cpu_bind option.
Set to "verbose" if the --cpu_bind option includes the verbose option. Set to "quiet" otherwise.
Set to the CPU binding type specified with the --cpu_bind option. Possible values two possible comma separated strings. The first possible string identifies the entity to be bound to: "threads", "cores", "sockets", "ldoms" and "boards". The second string identifies manner in which tasks are bound: "none", "rank", "map_cpu", "mask_cpu", "rank_ldom", "map_ldom" or "mask_ldom".
Set to bit mask used for CPU binding.
Set to value of the --mem_bind option.
Set to bit mask used for memory binding.
Arguments to control behavior
As stated before there are several arguments that you can use to get your jobs to behave a specific way. This is not an exhaustive list, but some of the most widely used and many that you will will probably need to accomplish specific tasks.
Declare the date/time a job becomes eligible for execution
To set the date/time which a job becomes eligible to run, use the --begin argument. If --begin is not specified, sbatch assumes that the job should be run immediately.
1) Defer job until 5 minutes later.
2) Set a specific date/time to run the job. Defer job until 16:10:00 (HH:MM:SS MM/DD/YY).
Defining the working directory path to be used for the job
To define the working directory path to be used for the job --workdir option can be used. If it is not specified, the default working directory is the directory where sbatch command is executed.
Manipulate the output files
To write standard output to a file, specify --output option of sbatch. To write standard error to a file, specify --error option.
slurm_%j.out is the filename, where "%j" is replaced by the job ID.
Mail job status at the start and end of a job
Submit a job to a specific queue
Submitting a job that is dependent on the output of another
Often you will have jobs that will be dependent on another for output in order to run.
To illustrate the ability to hold execution of a specific job until another has completed, we will write two submission scripts. The first will create a list of random numbers. The second will sort those numbers. Since the second script will depend on the list that is created we will need to hold execution until the first has finished.
Once the file are created, lets see what happens when they are submitted at the same time:
Since they both ran at the same time, the sort script failed because the file rand.list had not been created yet. Now submit them with the dependencies added.
We now see that the sort.sh job is in a hold state. And once the dependent job completes the sort job runs and we see:
We now see that the increasing order after sorting is preserved.
Submitting multiple jobs in a loop that depend on output of another job
This example show how to submit multiple jobs in a loop where each job depends on output of job submitted before it.
Let's say we need to write numbers from 0 to 29 in order onto a file output.txt. We can do 3 separate runs to achieve this, where each run has a separate sbatch script writing 10 numbers to output file. Let's see what happens if we submit all 3 jobs at the same time.
The script below creates required sbatch scripts for all the runs.
Print top 10 lines of output.txt.
This clearly shows the nubmers are in no order like we wanted. This is because all the runs wrote to the same file at the same time, which is not what we wanted.
Let's submit jobs using sbatch dependency feature. This can be achieved with a simple script shown below.
This shows that numbers are written in order to output.txt. Which in turn shows that jobs ran one after successful completion of another.
Opening an interactive shell to the compute node
srun command allows user to run interactive job on the compute node. srun comes with the following batch options:
-nnumDon't just submit the job, but also wait for it to start and connect
Specify the number of tasks to run, e.g. -n4. Default is one CPU core per task.
stdinto the current terminal
Request job running duration, e.g.
Specify the real memory required per node in MegaBytes, e.g.
Execute the first task in pseudo terminal mode, e.g.
--pty /bin/bash,to start a bash command shell
Enable X forwarding, so programs using a GUI can be used during the session (provided you have X forwarding to your workstation set up)
The batch job is terminated when the shell is exited.
Passing an environment variable to your job
You can pass user defined environment variables by using --export option.
To test this we will use a simple script that prints out an environment variable.
Submitting an array job: Managing groups of jobs
Sometimes users will want to submit large numbers of jobs based on the same job script.
First we need to create data to be read. Note that in a real application, this could be data, configuration setting or anything that your program needs to run.
Create Input Data
To create input data, run this simple one-liner:
Submit & Monitor
Instead of running five sbatch commands, we can simply enter:
Comma delimited lists
A more general for loop - Arrays with step size
To submit jobs in steps of a certain size, let's say step size of 3 starting at 0 and ending at 10, one can do
A List of Input Files/Pulling data from the ith line of a file
Suppose we have a list of 1000 input files, rather than input files explicitly indexed by suffix, in a file file_list.text one per line:
In this example, the '-n' option suppresses all output except that which is explicitly printed (on the line equal to SLURM_ARRAY_TASK_ID).
Let’s say you have a list of 1000 numbers in a file, one number per line. For example, the numbers could be random number seeds for a simulation. For each task in an array job, you want to get the ith line from the file, where i equals SLURM_ARRAY_TASK_ID, and use that value as the seed. This is accomplished by using the Unix head and tail commands or awk or sed just like above.
You can use this trick for all sorts of things. For example, if your jobs all use the same program, but with very different command-line options, you can list all the options in the file, one set per line, and the exercise is basically the same as the above, and you only have two files to handle (or 3, if you have a perl script generate the file of command-lines).
Delete all jobs in array
We can delete all the jobs in array with a single command.
Delete a single job in array
Delete a single job in array, e.g. number 4,5 and 7