Where am I?
The login nodes of each cluster have names like "login-0-1". You can add the following idiom to your $HOME/.bashrc file to set the prompt to the name of the cluster:
Something went wrong!
You can renew your account even after it has expired, see Getting or renewing an HPC account for how to renew.Data on /home, /archive and /work is deleted 90 days after account expiration. Data on /scratch is deleted after 60 days of not being used, whether your account is current or expired.
ls' command, a remote procedure call (RPC) is made to the metadata server (MDS), which returns a list of the files in the current directory. If you add certain options, such as
--color=tty, then for each file in the list,
stat()on that file. The
stat()call involves an RPC to the MDS and an RPC to the object storage server (OSS) which holds the file itself. These RPCs, especially those to the OSS, can take a long time.
I get a message about bad permissions
SSH is fussy about permissions on the
$HOME/.ssh directory. It should have:
rwx permission for the owner, and no permission at all for anyone else. You can set these permissions with the command:
.ssh/configon your workstation according to the example here.
- Delete your
.ssh/known_hostsfileYou will then be asked about connecting to a new host on the first time, you can safely answer "yes"
OSX roams "aggressively", which means that if it can see multiple access points it will abandon a working connection to pursue another one which might be better. The Bobst library supports a lot of wireless users and thus has many wireless access points, and Mac laptops behave like an undisciplined child in a candy store, authenticating to one point only to then disconnect and try another. Some actions which might help:
- Under Network preferences->WiFi->Advanced, remove all NYU networks except "nyu" from the "preferred networks" list, and move "nyu" to the top of the list
If that fails, you can disable the aggressive roaming from the terminal, with the command:
A2: If you are using Windows:
- If you are running the PeerGuardian personal firewall software, switch it off (it disables DHCP). Otherwise:
Recent versions of Windows take a supposedly-more-secure but also less reliable approach to authenticating to a wireless network, which causes network connections to be dropped unnecessarily. A pop-up bubble in the bottom corner of the screen which says "please re-enter your password" is an indication that this is happening.
- Instead of using the Windows-supplied wifi drivers, download and install the most recent driver from the manufacturer of your wireless-network-interface card
A3: if all this fails:
Come see the DS helpers - they may have another trick or two up their sleeve
If your script (or interactive environment) uses
bash(the default) or
If your script (or interactive environment) uses
In the case of a PBS job script, add one of the above lines before the first "
module" command in your script.
If you are seeing the error in an interactive shell, run one of the above commands at the prompt, then attempt the "
module load" command again.
Preparing your Mac for X
If you wish to use any software with a graphical interface, you will need an X server. This is a software package that draws on your local screen windows created on a remote computer (such as an NYU HPC cluster).
- Download and install XQuartz
Preparing your Windows workstation for X
If you wish to use any software with a graphical interface, you will need an X server. This is a software package that draws on your local screen windows created on a remote computer (such as an NYU HPC cluster). There are a couple of options out there:
- We recommend Cygwin/X. Instructions for downloading and installing it can be found here.
Before starting PuTTY you will need to have the X server running, by double-clicking the "XWin Server" shortcut under Cygwin-X on the Start Menu. You may wish to add this to your Windows Startup folder so it runs automatically after starting Windows
- Another good option is Xming. Installation instructions can be found on its web site.
As per Cygwin/X, you will need to launch Xming before starting PuTTY.
You will also need to download and install PuTTY SSH if you have not already.
$ARCHIVEdirectory - if you experience trouble with this please contact us.ause The
/scratchfilesystem is configured for large-block I/O, such as sequential reading and writing of large files. However individual I/O operations are relatively costly, so programs using frequent, small I/O accesses will put a heavy load on the metadata servers, which in extreme cases can cause the system to become unstable. The system administrators generally detect this quickly and may kill a job whose I/O characteristics are are stretching the capabilities of the filesystem (if this happens, we will contact you to help configure your job for better stability and performance).
- You are trying to run a simulation interactively instead of via the batch system. Please read Running jobs on the NYU HPC clusters, especially Writing and submitting a job for how to do this. If you need interactive use, read Submitting a job - Working interactively
- You are running a command on the login node that takes longer or needs more memory than you expected. Common causes include:
- Opening a large (more than 1GB) file with Vim (try using "less" instead)
- Compiling a large and complex source file
- Compressing a large file with gzip
At a minimum, the scheduler needs to know:
- How many CPU cores you need, and whether they must be on the same node
- If you don't know, the answer is probably "1 core". If the program supports parallel, it probably supports multithreading - multiple cores on a single node. To use multiple nodes, a program generally needs MPI, so you should only request multiple nodes if you are sure the program can use them.
- How much memory you need
- NYU has nodes with 23GB, 46GB, 62GB, 90GB, 120GB and 189GB available to jobs. (The remaining memory is needed by the operating system)
- How long the job is expected to take
- NYU HPC users can request up to 168 hours (1 week) for a single job. But priority is given to jobs requesting less time
CPUs - nodes and cores
HPC is associated with parallel processes - but it's not magic! To use multiple CPUs, the program must have been written with either threading (eg OpenMP) or message passing (MPI).
If in doubt:
- try with 1 core first
- check the documentation of the software, and next try with multiple cores on 1 node
- when using multiple nodes, check whether the job is actually running on all of the nodes. Contact us for help with this.
How much do I need?
The HPC cluster is not magic, its CPUs are only as fast as any other contemporary CPU. In fact, some nodes on Mercer are a few years old, and may be slower than your desktop (see Clusters July 2017 for a table of node types and when we installed them).
The performance of the HPC cluster comes from its scale. Most nodes in the cluster have 12 or 20 cores, 48GB up to 192GB of RAM, access to a large fast parallel filesystem and there is a 40Gb/s dedicated network link between any two nodes in each of the main groups. And there are thousands of nodes.
So although the resources your job needs depend very much on your job, and there is no simple rule for estimating requirements, you can make some initial guesses based on why you need the HPC cluster:
- My desktop has not enough RAM
You should request at least as much RAM as your desktop possesses. The time required will probably be similar to the time required on your desktop. Be aware though that many problems scale with O(n^2) or more, so doubling the number of data points might require 4x the RAM and 8x the compute time.
- Each run takes 4 hours on my 32GB desktop, and I have 1000 experiments to run
Each experiment will probably take 32GB of memory and 4 hours on the HPC cluster too - but you can submit 1000 of them at once and a few hundred might run simultaneously
For a few one-off jobs, you can safely request much more than you need. But use these qsub/PBS options:
In the email you will see something like:
From which your can deduce that the job took 42 minutes of wallclock time and about 3GB of memory. So a sensible resource request for the next job is (see RESOURCES for more about the options):
For further details, see Submitting a job - Resource limits
Options for delaying starting a job:
Delay starting this job until
jobidhas completed successfully.
Delay starting this job until after the specified date and time. Month (
MM) and day-of-month (
DD) are optional, hour and minute are required.
If you have jobs which must not be started until some other job has begun, completed or failed, you can set up a job dependency with
-W depend=dependency. The most common dependency is "
afterok", the job can start only after another job has completed with an exit code of zero (ie no errors). For example, to wait until job 12345 has completed successfully before starting:
Other types of job dependencies, including dependencies on job arrays, can be found with "
Job dependencies are useful when you have one job postprocessing the results of other jobs. Job dependencies also allow you to reserve resources more efficiently: such as single-CPU postprocessing of the result of a large parallel job.
Similarly, you can instruct Torque to not start a job before a given date and time by the command line option or PBS directive
-a hhmm. The date/time format accepts day, month and year too. For example, to delay starting a job until midday on the first day of the next month:
On earlier NYU HPC clusters, the output of batch jobs appeared as it happened, in a file like "
my_job.o12345" in the job submission directory. When jobs are submitted from
/scratch, this can causes a heavy small-block-I/O load on the Lustre filesystem, which impacts
On Mercer, job stdout and stderr are instead written to hidden files in your
$HOME area, under
$HOME/.pbs_spool. When the job completes, the
my_job.e12345 files are moved to their final location.
You can monitor your job's progress by looking for its output in this hidden directory
If your job writes too much to stdout or stderr, these temporary files might (temporarily) fill up your
$HOME space allocation. If this is occurring, first check if you can reduce the amount of stdout and stderr (they can slow down your program, contact us if you would like assistance with this). Another option is to redirect stdout and stderr to a file in some other location:
- With bash:
- With csh/tcsh
To request GPU nodes:
1 node with 1 core and 1 GPU
1 node with 1 core and 1 GPU, specifically a Titan Black GPU
1 node with 1 core and 1 GPU, specifically an Nvidia K80 GPU
1 node with 4 Titan GPUs. Note that we request ppn=4 too, it is always best to request at least as many CPU cores are GPUs
The available GPU node configurations are shown here.
When you request GPUs, the system will set two environment variables - we strongly recommend you do not change these:
- CUDA_VISIBLE_DEVICES has a comma-separated list of the device IDs this job is allowed to use (eg "2,3"). The CUDA library within the application will use this to prevent multiple GPU jobs on the same node from interfering with each other
- CUDA_DEVICES has a zero-based sequence of the "logical device IDs" for your job (eg "0 1"). So, if your application expects a list of GPU IDs starting at zero, and you have been allocated GPU numbers 2 and 3, then you can pass $CUDA_DEVICES to your application and it will see 2 devices, named "0" and "1", which happen to correspond (via $CUDA_VISIBLE_DEVICES) to the GPUs whose physical IDs are "2" and "3"
To your application, it will look like you have GPU 0,1,.. (up to as many GPUs as you requested). So if for example, you request 2 GPUs, and are allocated GPU 2 and GPU 3, you will have:
Now if your application calls "cudaSetDevice(0)", you will use the GPU that appears as device 0, but is actually device 2.
And a call to "cudaSetDevice(3)", will return an error, because as far as the application can see, the node only has 2 GPUs, numbered 0 and 1.
A1: You can
ssh to a specific login node. The login nodes on bowery are named
A2: You can ssh to a specific compute node if you have a job running on it. To find out which nodes your job is running on use:
You will see the usual qstat output followed by a list of the nodes and cores your job is allocated to. The list will look something like:
In this example, the list shows cores 11 and 10 on node
A: After submitting jobs, you will be able to locate where your jobs are executing by running pbstop -u NetID. Then you can monitor these jobs by logging in to the corresponding compute nodes and running top. You will then see both CPU and memory consumptions. If you find little memory left (or even little swap left) due to your job, you should increase the "ppn" number in your PBS script or maybe consider taking advantage of the nodes with larger memory
(still to come)
There are usage examples for many popular software packages in
/share/apps/examples on Mercer:
batch- An example batch job
blcr- Checkpoint-Restart facility for long jobs
comsol- Computational Fluid Dynamics
c-sharp- Language for the .NET/mono runtime environment
fluent- Computational Fluid Dynamics / Multiphysics package
gaussian- Chemistry package
matlab- For mathematical exploration
namd- Molecular dynamics
qchem-amber- Molecular dynamics
r- Interpreted language for statistics work
resource-usage- Shows minute-by-minute CPU and memory usage of a program
stata- Statistics package
A: To run STATA jobs, you need to;
(1) Prepare a STATA do file, such as "stata-test.do" which may include,
StataMP was also installed on the USQ, which is the parallel verision of Stata. To use multiple processors in StataMP, just insert "set processors X" as a line in your do file, where X is the number of processors and it should be equal to the "ppn" number in your PBS script.
(2) Create a PBS script "run-stata.pbs" to run STATA jobs in the batch mode. The content in this file can be like this,
Be sure to substitute your own "NetID" for NetID.
You need to change the paths according to your directory. Please refer to this page for more information.
(3) Then submit the job by typing
A: To run a Gaussian job, you need to prepare a Gaussian input file which might look like this,
and save it as "input.com".
You can run it from an interactive session if you expect your jobs to finish very soon. The command is,
Please use this script instead of running Gaussian by loading the module "gaussian/intel/G03-E01" and then executing g03 directly, because otherwise Gaussian will write scratch files to the default path and the system space might be filled up.
You may copy the "run-g03.csh" script to any directory and even rename it for your convenience.(more...)
You may run serial Matlab jobs from an interactive session or the batch mode. In both ways, you need to either load the module as,(more...)
Basic R jobs
Multiple R versions exist in HPC environment. To check what are available, on Prince:
Suppose we want to use 3.3.2, run these commands:
We first clean up the environment by doing 'module purge'. Then we load the R version selected, check what are available in current environment. We can see that R 3.3.2 is indeed loaded along with its dependency modules. Let's try this basic R example. We name it "example.R":
Below is the screen scene while running it on Prince:
What is shown above is a simple demo case on login nodes. For real interactive analysis scenario, users are encouraged to run on compute nodes using the 'srun' command to request dedicated resources, e.g.:
Besides running our analysis interactively, long running and big data crunching jobs ought to be submitted to the batch system slurm. The "example.R" can be submitted to slurm to run in batch mode.
Copy example files to your newly created directory.
Below is how the example looks like:
Then create a sbatch job script as:
Once the sbatch script file is ready, it can be submitted to the job scheduler using sbatch. After successful completion of job, verify output log file for detail output information.
The MPI modules on Mercer are built with support for the batch system. However, third-party MPI libraries and parallel software (such as Julia) may not be.
To launch a Julia job on multiple nodes you can use the node list provided by $PBS_NODEFILE:
For an MPI job that does not use the MPI modules on Mercer:
To run one multithreaded MPI process per node (hybrid MPI/OpenMP), see Running jobs - MPI.
Parallel libraries other than the MPI modules on Mercer do not normally have support for Torque, consequently they do not play nicely with other jobs. For this reason, the queue for multinode jobs sets the "#PBS -n" (node-exclusive) option. Multinode jobs therefore do not share nodes with any other job.
(thanks Spencer for the Julia tip!)
You need to have logged in with X forwarding enabled, as evince is an X application. See Logging in to the NYU HPC Clusters for how to do this.
On Mercer, enter 'myquota' at the prompt to see how much space you have used and available on each filesystem
An access control list (or ACL) gives per-file, per-directory and per-user control over who can read, write and execute files. You can see the ACL for a file or directory with the
To modify permissions for files or directories, use
setfacl. For a detailed description, see '
man setfacl'. In the example below, I give read permission on
dummy.txt to user
For setting execute permission on files - useful for scripts, and for allowing directories to be entered -
chmod is still used.
For faster transfer between the HPC clusters at NYU in NYC and the BuTinah cluster at NYUAD, use
scp over port 922. This will route the transfer over a high bandwidth ANKABUT link rather than the default low bandwidth MLPS link. The speed difference is greatest when pulling files from BuTinah to NYU NY.
Transferring many small files will still be slow - you will get better performance if you tar small files into a single archive, and transfer the tar file.
The default user environment on bowery sets an alias for
scp which does this automatically, so in most cases you can skip over this section. If you are finding that file transfers between NYUAD and NYU are slow, you can check whether you are using the alias with '
which scp'. If the response is not '
/usr/local/bin/scp_wrapper.sh', you should follow the instructions below.
scp over port 922 directly with the following commands, initiated from any of the NYU HPC clusters in NYC:
Pushing to BuTinah:
Pulling from BuTinah:(more...)
Mercer has '
pigz', which is a parallel version of
gzip. To use it:
Managing large numbers of files
Filesystems generally - and high-performance filesystems such as Lustre especially - perform best with a small to moderate number of medium to large files. Some specific issues to be aware of are:
- Lustre ($SCRATCH) gets performance mainly by striping distributing a large file across several disks and several "object storage servers". File metadata operations, on the other hand, do not have much parallelism available. So a few large read or write operations is vastly faster than many small reads or writes. This is true for reads and writes within a single files as well as for reads or writes on many files.
- If your job does many small I/O operations, it might be better to copy the file to $PBS_JOBTMP or $PBS_MEMDISK at the start of the job, and open the local copy of the file.
- (But for large reads and writes, $SCRATCH is likely to be faster than local disk)
The default stripe count on $SCRATCH is 4, so each file is striped across disks on 4 object storage servers. If you have a folder filled with files each smaller than 1MB, it is better not to stripe them. You can set the stripe count on a folder (under $SCRATCH) to 1 with:
- Finding a file within a folder is a serial operation. And the more files in a folder, the longer it takes. With several thousand files, even 'ls' on that folder can take several minutes and may affect responsiveness of the filesystem for other users.
- If you have more than about 1000 files in one folder, distribute them over a number of subfolders. The best performance will be when the number of subfolders is the square root of the total number of files (eg, for 1 million files, 1000 subfolders each containing 1000 files)
- The backup regime on
/archiveis optimized for small numbers of large files - 1000 files of 1kb each take 1000 times as long to backup as 1 file of 1MB! Too many small files can prevent the backup from completing in time
when archiving a collection of small files, please
tarthe files first. You can send a collection of files to /archive with the command:
And fetch it again with:
To replicate on Mercer a data directory you have on your workstation - assuming you are entering these commands on a local terminal on your workstation, and you have an SSH Tunnel set up and running:
The host name followed by a colon tells
rsyncthat the (in this case) destination is on another host. If your username on the other host is different to the username on the current host, you can specify the remote username with
Note the backslash in
\$USER- this instructs the shell not to expand
$USERto your local (on your workstation) username. An equivalent command is:
To copy in the other direction, from /scratch on Mercer to your workstation (again, from a local terminal on your workstation and across an SSH Tunnel):
Only those files not already up-to-date on your workstation will be copied.
First, download and install WinSCP tool from here. If you are inside NYU network (on campus), simply open WinSCP and fill in all the fields:
If you are outside of NYU network, one option is to set up and use VPN. After that you can use WinSCP as described above. Another option is to start an SSH tunnel on the workstation. We have instructions on how to do this for Windows workstations. Once your SSH tunnel is set up and started on dumbo, open WinSCP and fill the fields in as shown below :