You can request GPUs just like requesting ppn (processors per node):
Options to request compute resources:
Maximum wallclock time the job will need. Default is 1 hour. Walltime is specified in seconds or as
Maximum memory per node the job will need. Default depends on queue, normally 2GB for serial jobs and the full node for parallel jobs. Memory should be specified with units, eg
Number of nodes and number of processors per node required. Default is 1 node and 1 processor per node. The
:ppn=numcan be omitted, in which case (at NYU HPC) you will get full nodes. When using multiple nodes the job script will be executed on the first allocated node.
Submit to a specific queue. If not specified, Torque will choose a queue based on the resources requested.
A job submitted without requesting a specific queue or resources will go to the default serial queue (s48 on Mercer) with the default resource limits for that queue
Requesting the resources you need, as accurately as possible, allows your job to be started at the earliest opportunity as well as helping the system to schedule work efficiently to everyone's benefit.
1 node with 1 core and 1 GPU
1 node with 1 core and 1 GPU, specifically a Titan Black GPU
1 node with 1 core and 1 GPU, specifically an Nvidia K80 GPU
1 node with 4 Titan GPUs. Note that we request ppn=4 too, it is always best to request at least as many CPU cores are GPUs
The available GPU node configurations are shown here.
When you request GPUs, the system will set two environment variables - we strongly recommend you do not change these:
- CUDA_VISIBLE_DEVICES has a comma-separated list of the device IDs this job is allowed to use (eg "2,3"). The CUDA library within the application will use this to prevent multiple GPU jobs on the same node from interfering with each other
- CUDA_DEVICES has a zero-based sequence of the "logical device IDs" for your job (eg "0 1"). So, if your application expects a list of GPU IDs starting at zero, and you have been allocated GPU numbers 2 and 3, then you can pass $CUDA_DEVICES to your application and it will see 2 devices, named "0" and "1", which happen to correspond (via $CUDA_VISIBLE_DEVICES) to the GPUs whose physical IDs are "2" and "3"
To your application, it will look like you have GPU 0,1,.. (up to as many GPUs as you requested). So if for example, you request 2 GPUs, and are allocated GPU 2 and GPU 3, you will have:
Now if your application calls "cudaSetDevice(0)", you will use the GPU that appears as device 0, but is actually device 2.
And a call to "cudaSetDevice(3)", will return an error, because as far as the application can see, the node only has 2 GPUs, numbered 0 and 1.