- This line was added.
- This line was removed.
- Formatting was changed.
To submit parallel jobs or serial jobs longer than 6 hours, pass the training first. Instructions: Joining Dalma Online Training on iLearn
Before submitting jobs on Dalma, please make sure that you are familiar with the following basic Linux topics:
- How to login a remote host using ssh.
- How to manage (create, remove, move and copy) files and directories.
- How to edit a file without GUI using simple text editor (vi, nano or emacs).
- User / Group structure and file permission (e.g., what does "ls: cannot open directory : Permission denied" mean?).
- Reading and writing files with commands like head, tail, cat, and less
- I/O redirection (e.g., stdout, stderr, >, 2>, >>)
- Environmental variables like $PATH and $LD_LIBRARY_PATH (e.g., what does "-bash: matlab: command not found" mean?)
If not, we strongly advise you to learn these topics. This could very likely save you days in the future. There are plenty of tutorials freely available online:
Dalma is accessed through a dedicated set of login nodes, which are designed for light-weight, short period tasks. Access to the compute, GPU and visualization nodes for production runs, is controlled by the workload manager Slurm. The production jobs are submitted on login nodes to Slurm. Then Slurm will schedule and run these jobs on compute nodes.
Getting and Renewing an Account
To First you need a valid NetID (NYU Home Page). Then, to request or renew your access to NYUAD HPC, please follow the instructions in this page: Accounts, and go to the NYU High Performance Computing (HPC) account management site with your NetID: https://iiq.home.nyu.edu/identityiq
All NYUAD staff and students must have a faculty sponsor. Click here for more information.
Once you have an HPC account, you are ready to access a cluster. In the most simple case:
If you are using Windows, or prefer GUI to terminal, check this page Access Dalma for instructions.
Right after login to Dalma, user is automatically directed to $HOME. Dalma storage consists of 4 filesystems: $HOME (/home/<Net-ID>), $FASTSCRATCH (/fastscratch/<Net-ID>), $SCRATCH (/scratch/<Net-ID>) and $ARCHIVE (/archive/<Net-ID>) that can be referenced through the environment variables $HOME, $FASTSCRATCH, $SCRATCH and $ARCHIVE. Notice that $HOME is for small permanent storage, $FASTSCRATCH is for large infrequent reads and writes, $ARCHIVE is for long-term storage. $SCRATCH is for large infrequent reads and writes with large files. The access to $FASTSCRATCH is upon request and approval.
The quota of $HOME is only 5GB. Run myquota in terminal on Dalma to check your current usage.
# It's much less error-prone to access different files systems using environmental variables than absolute path # For example cd $HOME # is equivalent to cd /home/<your-NetID> # or ls $SCRATCH
Files not accessed for 90 days at $FASTSCRATCH and $SCRATCH will be deleted.
Submit your job and prepare your input / output in $SCRATCH
Put your source code, applications and executable in $HOME
Back up your data in $ARCHIVE
Contact us for usage in $FASTSCRATCH
Users can not acquire all physical memory on nodes in their job scripts. Some memory is reserved for system. See the form below.
|Node Type||Number of Nodes||Hardware per Node||Maximum Memory Per Node User Can Require||Note|
|Standard Compute||236||128GB, 28 cores, Broadwell||112GB|
|Fat||8||192GB memory, 12 cores, Westmere||180GB|
1TB memory, 32 cores, Westmere
|Ultra Fat||1||2TB memory, 72 cores, Broadwell||2000GB||Consult with us for access to this node|
|GPU||16||96GB memory, 12 cores, Westmere, NVIDIA GPU||90GB|
128GB, 28 cores, Broadwell
We now have available a new Module Environment in Dalma, which is part of the User Centric Approach that we have been promoting from NYUAD to manage the software stack. This new Module Environment, NYUAD 3.0 overcome the flaws of the traditional modules environment when used to manage complex modern software environments.
First, you could check what applications are available
# Run the following commands after logging in Dalma module avail
Then you could select the desired software to load. The following example shows how to load a self-sufficient-single-application environment for gromacs.
# Run the following commands after logging in Dalma module load NYUAD/3.0 module load gromacs # or use the full module name module load gromacs/5.0.4
Thie following example shows how to load an environment for compiling source code from scratch.
# Run the following commands after logging in Dalma module load NYUAD/3.0 module load gcc # multiple modules could be loaded in one line module load openmpi fftw3
The batch system on Dalma is the Slurm (Simple Linux Utility for Resource Management), a free open-source resource manager by LLNL. Similar to most supercomputers, on Dalma, production jobs are submitted to the batch system. In order to submit a job you need to create a submission script where you specify your resources requirements. Before jobs are dispatched to run they are put onto partitions waiting for available processing resources. There are partitions for various types of use. Parallel partition will allocate entire nodes to the job (i.e. only 1 job per node). Serial partition allows multiple jobs to share one node.
Available Partitions (Queues)
We abstract 3 partitions for users.
- serial: For job using no more than 1 node.
- parallel: For job using more than 1 node.
- preempt: For serial jobs which could be re-queued and less than 30 minutes walltime.
Please refer to this page to check what job limit applies to you according to your affiliation.
Writing a Batch Script
A job script is a text file describing the job, resources required. Slurm has its unique directives, but is similar in many ways to PBS or LSF. Moreover, Slurm maintaned a good compability on PBS script. In many case, PBS script is directly acceptable.
To submit serial jobs longer than 6 hours, pass the training first. Instructions: Joining Dalma Online Training on iLearn
Serial Job Example
A typical Slurm serial job script looks like this. Let say you save it as serial-job.sh
Code Block language bash theme RDark title Typical Serial Job Script in Slurm
#!/bin/bash #SBATCH -p serial # Set number of tasks to run #SBATCH --ntasks=1 # Walltime format hh:mm:ss #SBATCH --time=00:30:00 # Output and error files #SBATCH -o job.%J.out #SBATCH -e job.%J.err # **** Put all #SBATCH directives above this line! **** # **** Otherwise they will not be in effective! **** # # **** Actual commands start here **** # Load modules here (safety measure) module purge # You may need to load gcc here .. This is application specific # module load gcc # Replace this with your actual command. 'serial-hello-world' for example hostname
Below, your will find a generic Slurm job scripts will gentle explanation with each directives.
Then you can submit the saved job script serial-job.sh with:
Code Block title Submitting a Serial Job
Parallel Job Example
To submit parallel jobs, pass the training first. Instructions: Joining Dalma Online Training on iLearn
A typical Slurm parallel job script looks like this. Let say you save it as parallel-job.sh
Code Block language bash theme Midnight title Typical Parallel Job Script in Slurm
#!/bin/bash #SBATCH -p parallel # Set number of tasks to run # To maximize the performance, set ntasks to be divisible by 28, e.g., 56, 84... #SBATCH --ntasks=56 # Walltime format hh:mm:ss #SBATCH --time=00:30:00 # Output and error files #SBATCH -o job.%J.out #SBATCH -e job.%J.err # **** Put all #SBATCH directives above this line! **** # **** Otherwise they will not be in effective! **** # # **** Actual commands start here **** # Load modules here (safety measure) module purge # You may need to load gcc here .. This is application specific # module load gcc # Replace this with your actual command. 'srun roms' for example srun hostname
Then you can submit the saved job script parallel-job.sh with:
Code Block title Submitting a Parallel Job
Submitting a Job
Please be aware that submitting jobs is only possible from login nodes at the moment. Contact us if you need help.
Command sbatch is for submitting jobs. A simple example:
After the submission, it will return the corresponding job id. Once scheduled for run, the script is executed on the first compute node in the allocation.
Checking Job Status
Before and During Job Execution
This command gives you all the jobs by a certain user.
squeue -u <username>
[gh50@login-0-1 ~]$ squeue -j 31408 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 31408 ser_std job1.sh gh50 R 0:02 1 compute-21-4
It means the job with Job ID 31408, has been running (ST: R) for 2 minutes on compute-21-4.
For more verbose information, use scontrol show job.
scontrol show job <jobid>
After Job Execution
Once the job is finished, the job can't be inspected by squeue or scontrol show job. At this point, you could inspect the job by sacct.
sacct -j <jobid>
The following commands give you extremely verbose information on a job.
sacct -j <jobid> -l
Canceling a Job
If you decide to end a job prematurely, use scancel
To cancel all jobs from your account. Run this on Dalma terminal.
To start an interactive session, use srun command:
srun --pty -n 1 /bin/bash
Use srun for interactive mode.
On This Page: