The simplest queuing algorithm is "first come, first served". The queuing of jobs on the HPC cluster is a little more sophisticated as we pursue several goals:
- Minimal queuing times, especially for short jobs. Nobody wants to spend 4 hours in the queue for a 1-hour job.
- Efficient use of the available resources. If there is a job ready which can use hardware that would otherwise be idle, run it, even if it's not next in the queue.
Fair use of resources. If you've made heavy use of the cluster recently, jobs belonging to a user who has had less CPU time will get higher priority.
At NYU "recently" means "the last 24 hours", so users with large workloads are not excessively penalized.
Should you need more resources than the fair share allocations because of critical deadlines such as a grant application, a publication deadline, or class use, please email email@example.com to make special arrangements.
Moab supports these goals by calculating a priority for each submitted job and placing the job in the queue according to its priority. The schedule of which job will run where and when is built from the job queues. When a job finishes earlier than scheduled (due to an overestimated walltime request), Moab attempts to fill the newly-available space by scanning the queue for the first job which will fit without delaying an already-scheduled, higher priority job. In this way low-priority jobs with smaller resource requirements can jump ahead and be run early.
You can take advantage of this by requesting CPU, walltime and memory resources as accurately as possible. Be careful not to request too little though, or your job may exceed the request and be killed.
Monitoring jobs with qstat
To see the status of a single job - or a list of specific jobs - pass the Job IDs to
qstat, as in the following example:
Most of the fields in the output are self-explanatory. The second-last column "S" is the job status, which can be :
- Q meaning "Queued"
- H meaning "Held" - this may be the result of a manual hold or of a job dependency
- R meaning "Running"
- C meaning "Completed". After the job finishes, it will remain with "completed" status for a short time before being removed from the batch system.
Other, less common job status flags are described in the manual (
The qstat command is described in more detail here.
What is running on the cluster, and where? Interpreting pbstop
When will my job start?
You can get an estimate of the scheduled starting time for a job with
Note that showstart is based on the scheduled time - which might be adjusted as other jobs are added to the queue and if already-running jobs finish ahead of time.
Also, if you've only just submitted the job, the scheduler might not have seen it yet. Moab only collects new jobs to schedule every ~15 seconds.
Setting job priorities
If you have several jobs in the queue, and would like certain of them to be prioritized over others, you can set the relative priority of a job by submitting it with:
priorityis a number between -1024 and +1023. A higher number means higher priority. The default priority is 0.
This only affects the priority of a job relative to other jobs owned by you - it does not affect the priority of your job compared to any job belonging to a different user.
job-scriptis the name of your job script
You can also pass -p as a PBS directive within your job script:
For more about qsub and PBS directives, see Writing and submitting a job
Why hasn't my job started?
You can get information about what is preventing a queued job from running with
The output of
checkjob is complicated and technical. Mostly a job remains in the queue because it is waiting for resources to become available (you can check how busy the system is with
pbstop). Other likely causes are that it is waiting on a job dependency, or you have reached the limit of simultaneously running jobs for a single user. If your job has been waiting a long time and you would like help understanding why, contact us.
In the example below the job requested 12 large-memory nodes, and the blue text on the last line indicates that the scheduler has not yet found a large enough timeslot slot in which it can run (note that it has found four such nodes available).