Skip to end of metadata
Go to start of metadata



What Cluster am I on?

 How can I show the cluster name in the command prompt?

The cluster name has been defined through an environment variable $CLUSTER by your admin. You can add the following idiom to your $HOME/.bashrc file to set the prompt to the name of the cluster:

# what cluster am I on?
PS1='['$CLUSTER': \u@\h \W]\$ '


File/space quota usage.

 How much of the file/space quota have I used?

Use the myquota command in your ssh session to view your usage for HPC’s different filesystems. Here is a real example output:

Filesystem   Environment Backed up?   Allocation Current Usage

Space        Variable /Flushed?    Space / Files Space(%) / Files(%)

/home       $HOME  Yes/No 20.0GB/-           1.71GB(8.54%)/-

/scratch    $SCRATCH   No/Yes 5.0TB/1.0M      42.92GB(0.84%)/102976(10.30%)

/beegfs     $BEEGFS   No/Yes 2.0TB/3.0M        0.00GB(0.00%)/0(0.00%)

/archive    $ARCHIVE  Yes/No 2.0TB/-           0.00GB(0.00%)/1

We will go through these filesystems in order: /home is the default path (~) when you ssh into a login node. The absolute path is /home/[YOUR_NETID]. The location is not for heavy storage, nor is it for analysis. Hence, the quota is 20GB. Allocation space is the max total usage per your user. Files is the max number of files permitted. Whichever quota is reached first, if any, will be the bottlenecking number. Pay close attention to whether or not the system is flushed, since that would prompt users to back up files themselves periodically (ideally right after a job is done running).

/scratch is for scratch work and analysis - it supports a large amount of data, and users are encouraged to run their tests on this filesystem. The absolute path is /scratch/[YOUR_NETID]. If your netID is mll469, to access scratch, perform a simple cd /scratch/mll469. If you run myquota in any of these locations, you will get the same output.

/beefgs is for a large amount of small files (e.g. millions of kilobyte large files).

/archive is where you can store files without a very frequent memory purge. This is for longterm output.



HPC Account

 How can I continue to access HPC after I leave/graduate from NYU

As long as there is a full time faculty member in NYU who would like to support your HPC accounts, you will be eligible for NYU HPC access. If your NYU Home status is expired, please ask your advisor in NYU to send an email to to extend your NYU Home and Google Apps for another year for HPC access. After that, to reset your NYU password, please go to the page

Then you can follow the instructions in NYU HPC wiki to renew your HPC accounts

 Can my non-NYU collaborator access and use the clusters maintained by the NYU HPC team?

Yes, if you are a full time NYU faculty you can sponsor your non-NYU collaborator for an HPC account.

Step 1: Sponsor your collaborator for an NYU NetID

The process is described in the page NYU Affiliate Management. Basically, you (the NYU sponsor) need to visit the NYU Start Page and login with your NetID and password. Once you have logged in, click the For Affiliate Sponsors: Request a new NetID or affiliation link in the menu. Fill out all the required fields on the form that pops-up. There are detailed instructions on how to fill out the form in a Knowledge Base article.

Step 2: Request access to NYU Home and Google apps for your collaborator

Once your collaborator has an NYU NetID, send an email to requesting to provide to your collaborator access to NYU Home and Google Apps.

Step 3: Activate the collaborator NetID and set password

Have your collaborator visit the  NYU Start Page to activate the NetID and set their password.

Step 4: Ask you collaborator to apply for an NYU HPC account

Ask your collaborator to follow the instructions in NYU HPC wiki to apply for an HPC account:


HPC File Systems

 Am I blocked from accessing the HPC gateway?

If you try to login remotely on the HPC gateways and you get an error similar to:

$ ssh
ssh_exchange_identification: read: Connection reset by peer

you may be blocked for accessing the HPC gateway.

This can happen because you failed to login several times (possibly entering the wrong password) from the same remote server. In such cases the IP address of the server you are trying to login from is blocked.

If you suspect that you are blocked from accessing the HPC gateways please email the IP address of the server you attempt to login from to the HPC team (

To find the IP address of the server please Google search:  “What is my IP Address” or simply click here

 Can you help me recover files that I deleted accidentally?

Scratch file systems (Lustre and BeeGFS(

If you accidentally deleted files from the scratch file systems, they CANNOT be recovered. The are no backups or snapshots of the scratch file systems.

Home Directories

If you accidentally deleted files from your home directory, then Yes, files can be recovered. You can recover the files you deleted by examining the snapshots on your own.

The home directory snapshots can be found in the following directory: /home/.zfs/snapshot/

Example snapshot on home directory: /home/.zfs/snapshot/2018-10-19-213100/netid

Archive file system

If you accidentally deleted files from your archive file system


 How much of my file/space quota have I used?

On Prince, enter 'myquota' at the prompt to see how much space you have used and available on each filesystem. For example

[ad4315@log-1 ~]$ myquota

Hostname: log-1 at Tue Oct 16 14:45:23 EDT 2018

Filesystem   Environment   Backed up?   Allocation       Current Usage

Space        Variable      /Flushed?    Space / Files    Space(%) / Files(%)

/home       $HOME          Yes/No        20.0GB/-          14.27GB(71.36%)/-

/scratch    $SCRATCH       No/Yes        5.0TB/1.0M      48.73GB(0.95%)/54719(5.47%)

/beegfs     $BEEGFS        No/Yes        2.0TB/3.0M        0.00GB(0.00%)/0(0.00%)

/archive    $ARCHIVE       Yes/No         2.0TB/-           0.00GB(0.00%)/3

 How to use RCLONE to copy files between the Prince storages and Google Drive etc?

Please see this wiki page dedicated to the RCLONE usage.  

 What happened to my data on /scratch?

The /scratch filesystem is a short-term filesystem providing fast I/O for running jobs. It is not backed up, moreover files which remain unused for a period of time a flushed. See Storage July 2017 - especially /scratch policy for further information.

 Where is my folder on the HPC archive file system?

All HPC users should have a folder on the archive file system. The folder is:


For example, if your NetID is ad4315, your archive folder is:  /archive/a/ad4315

The Environment Variable $ARCHIVE points to your archive folder. To see the contents (files and directories) of your archive folder, run the following commands on the Prince login node:

[ad4315@log-0 ~]$ echo $ARCHIVE
[ad4315@log-0 ~]$ ls -l $ARCHIVE
total 4
drwxr-s--- 2 ad4315 ad4315 4096 Oct 15 17:36 oldDir
-rw-r----- 1 ad4315 ad4315    0 Oct 15 17:37 testFile

 If you still have trouble accessing your archive folder, please send the following information to the HPC team (

  1. Your NetID

  2. The output of the following two commands (on the Prince login node)
    1. echo $ARCHIVE
    2. ls -l $ARCHIVE

Something went wrong!

 My account expired! What should I do? Is my data gone forever?

You can renew your account even after it has expired, see Getting or renewing an HPC account for how to renew. Data on /home, /archive and /work is deleted 90 days after account expiration. Data on /scratch is deleted after 60 days of not being used, whether your account is current or expired.

 Why is "ls" on /scratch so slow?
Lustre stores the file itself and the file metadata (its name, size, etc) separately. When you issue a simple 'ls' command, a remote procedure call (RPC) is made to the metadata server (MDS), which returns a list of the files in the current directory. If you add certain options, such as -l or --color=tty, then for each file in the list, ls will call stat() on that file. The stat() call involves an RPC to the MDS and an RPC to the object storage server (OSS) which holds the file itself. These RPCs, especially those to the OSS, can take a long time.

 I can't login

I get a message about bad permissions

SSH is fussy about permissions on the $HOME/.ssh directory. It should have:

[ab123@myworkstation ~]$ ls -la $HOME
total 545460
drwx------ 11 ab123 users      4096 Sep  4 11:19 .
drwxr-xr-x  3 root  root          0 Aug 28 14:44 ..
drwx------  2 ab123 users      4096 Aug  8 11:19 .ssh

Note that .ssh has rwx permission for the owner, and no permission at all for anyone else. You can set these permissions with the command:

[ab123@myworkstation ~]$ chmod 700 $HOME/.ssh



 When trying to login, I get warnings about "HOST IDENTIFICATION HAS CHANGED"
 Recent versions of OSX and Ubuntu have a newer version of ssh than the NYU HPC clusters. You can prevent the warnings with two steps:
  1. Update .ssh/config on your workstation according to the example here.
  2. Delete your .ssh/known_hosts fileYou will then be asked about connecting to a new host on the first time, you can safely answer "yes"

 Q: In the library, my wireless connection keeps dropping out. How can I fix it?
A1: If you are using a Mac:

OSX roams "aggressively", which means that if it can see multiple access points it will abandon a working connection to pursue another one which might be better. The Bobst library supports a lot of wireless users and thus has many wireless access points, and Mac laptops behave like an undisciplined child in a candy store, authenticating to one point only to then disconnect and try another. Some actions which might help:

  • Under Network preferences->WiFi->Advanced, remove all NYU networks except "nyu" from the "preferred networks" list, and move "nyu" to the top of the list
  • If that fails, you can disable the aggressive roaming from the terminal, with the command:

    sudo defaults write /Library/Preferences/ disabled -bool true


A2: If you are using Windows:

  • If you are running the PeerGuardian personal firewall software, switch it off (it disables DHCP). Otherwise:

Recent versions of Windows take a supposedly-more-secure but also less reliable approach to authenticating to a wireless network, which causes network connections to be dropped unnecessarily. A pop-up bubble in the bottom corner of the screen which says "please re-enter your password" is an indication that this is happening.

  • Instead of using the Windows-supplied wifi drivers, download and install the most recent driver from the manufacturer of your wireless-network-interface card

A3: if all this fails:

Come see the DS helpers - they may have another trick or two up their sleeve


 Q: I used "module load" and it failed with a "module: command not found" error
 Normally the location of the module command is set up when the shell is started, but under some circumstances that startup procedure can be bypassed. If you get this error you can explicitly prepare your environment for modules with one of the following commands:
  • If your script (or interactive environment) uses bash (the default) or sh or ksh:

    source /etc/profile.d/
  • If your script (or interactive environment) uses csh or tcsh:

    source /etc/profile.d/env-modules.csh

In the case of a PBS job script, add one of the above lines before the first "module" command in your script.

If you are seeing the error in an interactive shell, run one of the above commands at the prompt, then attempt the "module load" command again.

 Warning: no access to tty (Bad file descriptor), Thus no job control in this shell
It's harmless and does not indicate an error. This is an innocuous warning that simply means you are running a script (rather than a binary) under a job that has no access to the TTY.In other words, you cannot interrupt it (^C), suspend it (^Z) or use other interactive commands because there is no screen or keyboard to interact with it.It can be safely ignored.

 I get an error "Warning: no display specified." when I use -X flag with ssh

Preparing your Mac for X

If you wish to use any software with a graphical interface, you will need an X server. This is a software package that draws on your local screen windows created on a remote computer (such as an NYU HPC cluster).


Preparing your Windows workstation for X

If you wish to use any software with a graphical interface, you will need an X server. This is a software package that draws on your local screen windows created on a remote computer (such as an NYU HPC cluster). There are a couple of options out there:

  • We recommend Cygwin/X. Instructions for downloading and installing it can be found here.
    Before starting PuTTY you will need to have the X server running, by double-clicking the "XWin Server" shortcut under Cygwin-X on the Start Menu. You may wish to add this to your Windows Startup folder so it runs automatically after starting Windows 
  • Another good option is Xming. Installation instructions can be found on its web site.
    As per Cygwin/X, you will need to launch Xming before starting PuTTY.  

You will also need to download and install PuTTY SSH if you have not already.

 Who killed my job, and why?
The most likely culprit is the batch system. The other prime suspect is us, the HPC system administrators.If the batch system killed your job, it probably did so because it exceeded the amount of memory or CPU time requested (which may have been some default, if you did not explicitly request these). See Submitting a job - Resource limits for help on this, and this overview of scheduling for a more general introduction.If we killed your job, it was probably to prevent the system from crashing, which can happen when a jobs runtime behavior puts a certain type of load on the system. The most common trouble is non-optimal use of the /scratch filesystem, described at the bottom of the table on the Storage page. The next most common reason is that you were running your job on the login node rather than through the batch system - see Running jobs - Nodes for more on this. There are a few circumstances where the login node is the only option - especially archiving files to your $ARCHIVE directory - if you experience trouble with this please contact us.ause The /scratch filesystem is configured for large-block I/O, such as sequential reading and writing of large files. However individual I/O operations are relatively costly, so programs using frequent, small I/O accesses will put a heavy load on the metadata servers, which in extreme cases can cause the system to become unstable. The system administrators generally detect this quickly and may kill a job whose I/O characteristics are are stretching the capabilities of the filesystem (if this happens, we will contact you to help configure your job for better stability and performance).

 I got an email "Please do not run jobs on login nodes"
 The login nodes are a shared resource intended for editing scripts, compiling small amounts of code and moving data about. This is enforced via process size limits - if a command on a login node runs for too long or uses too much memory, the system will kill it and send you this email. The likely causes are:
  • You are trying to run a simulation interactively instead of via the batch system. Please read Running jobs on the NYU HPC clusters, especially Writing and submitting a job for how to do this. If you need interactive use, read Submitting a job - Working interactively
  • You are running a command on the login node that takes longer or needs more memory than you expected. Common causes include:
    • Opening a large (more than 1GB) file with Vim (try using "less" instead)
    • Compiling a large and complex source file
    • Compressing a large file with gzip
In most cases the best solution is to start an interactive session on a compute node, requesting sufficient resource for the task at hand - see Submitting a job - Working interactively

Running Jobs

 What resources can and should I request?


For further details, see Submitting a job - Resource limits

 Can I make sure a job gets executed only after another one completes?

A: Yes

This page is retained from an earlier version of the HPC wiki only for reference.

 Where did my job output go?


 How do I use GPUs?


 How do I log in to a specific node?

A1: You can ssh to a specific login node. The login nodes on bowery are named login-0-0, login-0-1, login-0-2, login-0-3.

$ ssh login-0-0

A2: You can ssh to a specific compute node if you have a job running on it. To find out which nodes your job is running on use:

$ qstat -n jobid

You will see the usual qstat output followed by a list of the nodes and cores your job is allocated to. The list will look something like:


In this example, the list shows cores 11 and 10 on node compute-6-3.

 How can I ensure my resource-intensive job is running smoothly?

A: After submitting jobs, you will be able to locate where your jobs are executing by running pbstop -u NetID. Then you can monitor these jobs by logging in to the corresponding compute nodes and running top. You will then see both CPU and memory consumptions. If you find little memory left (or even little swap left) due to your job, you should increase the "ppn" number in your PBS script or maybe consider taking advantage of the nodes with larger memory


 My job will take longer than 48 hours, what should I do?

For long-running jobs, we suggest using Checkpoint-Restart to split the job into a series of shorter jobs. We realize though that this is not always possible - if you need the walltime limit for a job extended, contact us.  

 My job needs (MySQL, some other service) to be running
 I want to run a job at 9am every day

(still to come) 

 How do I run a STATA job?

 How do I run a Gaussian job?

 How do I run a Matlab job?

 How do I run a R job?

Basic R jobs

Multiple R versions exist in HPC environment. To check what are available, on Prince:

$ module avail r

----------------------------------------------------- /share/apps/modules/modulefiles ---------------------------------------------
r/intel/3.0.3                       ray/openmpi/intel/20160114          reprozip/intel/1.0.3                rpy2/intel/2.5.6
r/intel/3.1.2                       rdkit/intel/201409.2                requests/2.7.0                      rsem/intel/1.2.15
r/intel/3.2.0                       rdp_classifier/2.11                 ribopicker/0.4.3                    rseqc/intel/2.3.9
r/intel/3.2.2                       recon/intel/1.08                    rmblast/2.2.28                      rstudio/0.98.1028
randfold/intel/2.0                  rendertoolbox3/2.1-18               rose/20151118                       rtax/0.984
raxml/intel/7.3.0                   repeat_masker/4.0.5                 rosetta/intel/54167                 ruby/gnu/2.1.1
raxml/intel/8.0.23                  repeatmodeler/1.0.8                 rosetta/openmpi/intel/2014.35.57232
raxml/intel/8.2.5                   repeatscout/intel/1.0.5             rosetta/openmpi/intel/54167

Suppose we want to use 3.3.2, run these commands:

$ module purge
$ module list
No Modulefiles Currently Loaded.
$ module load r/intel/3.3.2
$ module list
Currently Loaded Modulefiles:
  1) intel/14.0.2          4) cairo/gnu/1.12.16     7) mpfr/gnu/3.1.2       10) pcre/intel/8.34      13) openssl/gnu/1.0.1g   16) r/intel/3.3.2
  2) zlib/intel/1.2.8      5) expat/intel/2.1.0     8) mpc/gnu/1.0.2        11) icu/gnu/52.1         14) curl/intel/7.38.0
  3) bzip2/intel/1.0.6     6) gmp/gnu/5.1.3         9) gcc/4.8.2            12) libxml2/intel/2.9.1  15) openmpi/intel/1.6.5
$ R

We first clean up the environment by doing 'module purge'. Then we load the R version selected, check what are available in current environment. We can see that R 3.3.2 is indeed loaded along with its dependency modules. Let's try this basic R example. We name it "example.R":

df <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))
indices <- order(df$x)

Below is the screen scene while running it on Prince:

$ R

R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-centos-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> df <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))
> df
  x  y
1 1  7
2 2 19
3 3  2
4 1  2
> indices <- order(df$x)
> order(df$x)
[1] 1 4 2 3
> df[indices,]
  x  y
1 1  7
4 1  2
2 2 19
3 3  2
> df[rev(order(df$y)),]
  x  y
2 2 19
1 1  7
4 1  2
3 3  2
> quit()
Save workspace image? [y/n/c]: n

What is shown above is a simple demo case on login nodes. For real interactive analysis scenario, users are encouraged to run on compute nodes using the 'srun' command to request dedicated resources, e.g.:

$ srun --x11 --nodes=1 --ntasks-per-node=4 --mem=4000 -t2:00:00 --pty /bin/bash
$ xterm

$ module load r/intel/3.3.2
$ R

Besides running our analysis interactively, long running and big data crunching jobs ought to be submitted to the batch system slurm. The "example.R" can be submitted to slurm to run in batch mode.

Copy example files to your newly created directory.

cp /share/apps/examples/r/basic/example.R /scratch/$USER/example

Below is how the example looks like:

$ cat example.R
df <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))
indices <- order(df$x)

Then create a sbatch job script as:

$ cat run-R.sbatch
#SBATCH --job-name=RTest
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem=2GB
#SBATCH --time=01:00:00
module purge
module load r/intel/3.3.2
cd /scratch/$USER/example
R --no-save -q -f example.R > example.out 2>&1

Once the sbatch script file is ready, it can be submitted to the job scheduler using sbatch. After successful completion of job, verify output log file for detail output information.

sbatch run-R.sbatch




 How do I run a Mathematica job?

 How do I start a multinode parallel job that is NOT MPI (eg Julia)?

The MPI modules on Mercer are built with support for the batch system. However, third-party MPI libraries and parallel software (such as Julia) may not be.

To launch a Julia job on multiple nodes you can use the node list provided by $PBS_NODEFILE:

julia --machinefile $PBS_NODEFILE ./my_code.jl

For an MPI job that does not use the MPI modules on Mercer:

mpirun -np --machinefile $PBS_NODEFILE ./my_mpi_exec.exe

To run one multithreaded MPI process per node (hybrid MPI/OpenMP), see Running jobs - MPI.

Parallel libraries other than the MPI modules on Mercer do not normally have support for Torque, consequently they do not play nicely with other jobs. For this reason, the queue for multinode jobs sets the "#PBS -n" (node-exclusive) option. Multinode jobs therefore do not share nodes with any other job.

(thanks Spencer for the Julia tip!)

 How can I view a PDF file on Prince?

evince myfile.pdf

You need to have logged in with X forwarding enabled, as evince is an X application. See Logging in to the NYU HPC Clusters for how to do this.

Using Software

There are usage examples for many popular software packages in /share/apps/examples on Prince:
  • batch - An example batch job
  • blcr  - Checkpoint-Restart facility for long jobs
  • comsol  - Computational Fluid Dynamics
  • c-sharp  - Language for the .NET/mono runtime environment
  • fluent  - Computational Fluid Dynamics / Multiphysics package
  • gaussian - Chemistry package
  • matlab  - For mathematical exploration
  • namd  - Molecular dynamics
  • qchem-amber  - Molecular dynamics
  • r  - Interpreted language for statistics work
  • resource-usage  - Shows minute-by-minute CPU and memory usage of a program
  • stata - Statistics package

Managing Data

 How do I give my colleague access to my files?

 An access control list (or ACL) gives per-file, per-directory and per-user control over who can read, write and execute files. You can see the ACL for a file or directory with the getfacl command:

$ getfacl myfile.txt

To modify permissions for files or directories, use setfacl. For a detailed description, see 'man setfacl'. In the example below, I give read permission on dummy.txt to user bob123:

$ setfacl -m u:bob123:r myfile.txt

 For setting execute permission on files - useful for scripts, and for allowing directories to be entered - chmod is still used.


 How do I get the best transfer speed to or from BuTinah?

For faster transfer between the HPC clusters at NYU in NYC and the BuTinah cluster at NYUAD, use scp over port 922. This will route the transfer over a high bandwidth ANKABUT link rather than the default low bandwidth MLPS link. The speed difference is greatest when pulling files from BuTinah to NYU NY.

Transferring many small files will still be slow - you will get better performance if you tar small files into a single archive, and transfer the tar file.

The default user environment on bowery sets an alias for scp which does this automatically, so in most cases you can skip over this section. If you are finding that file transfers between NYUAD and NYU are slow, you can check whether you are using the alias with 'which scp'. If the response is not '/usr/local/bin/', you should follow the instructions below.

You can scp over port 922 directly with the following commands, initiated from any of the NYU HPC clusters in NYC:

Pushing to BuTinah:

$ scp -P 922 filename

Pulling from BuTinah:

$ scp -P 922 .


 I have a huge amount of data that I want to compress for storage or transfer

 Mercer has 'pigz', which is a parallel version of gzip. To use it:

module load pigz/intel/2.3.1

pigz --help

 My workflow uses thousands of small files, how should I manage them

Managing large numbers of files

Filesystems generally - and high-performance filesystems such as Lustre especially - perform best with a small to moderate number of medium to large files. Some specific issues to be aware of are:


  • Lustre ($SCRATCH) gets performance mainly by striping distributing a large file across several disks and several "object storage servers". File metadata operations, on the other hand, do not have much parallelism available. So a few large read or write operations is vastly faster than many small reads or writes. This is true for reads and writes within a single files as well as for reads or writes on many files.
    • If your job does many small I/O operations, it might be better to copy the file to $PBS_JOBTMP or $PBS_MEMDISK at the start of the job, and open the local copy of the file.
    • (But for large reads and writes, $SCRATCH is likely to be faster than local disk)

  • The default stripe count on $SCRATCH is 4, so each file is striped across disks on 4 object storage servers. If you have a folder filled with files each smaller than 1MB, it is better not to stripe them. You can set the stripe count on a folder (under $SCRATCH) to 1 with:

    lfs setstripe -c 1 $SCRATCH/folder_with_small_files/
  • Finding a file within a folder is a serial operation. And the more files in a folder, the longer it takes. With several thousand files, even 'ls' on that folder can take several minutes and may affect responsiveness of the filesystem for other users.
    • If you have more than about 1000 files in one folder, distribute them over a number of subfolders. The best performance will be when the number of subfolders is the square root of the total number of files (eg, for 1 million files, 1000 subfolders each containing 1000 files)


  • The backup regime on /archive is optimized for small numbers of large files - 1000 files of 1kb each take 1000 times as long to backup as 1 file of 1MB! Too many small files can prevent the backup from completing in time
    • when archiving a collection of small files, please tar the files first. You can send a collection of files to /archive with the command:

      # to archive $SCRATCH/my_run_dir
      tar cvf $ARCHIVE/simulation_01.tar -C $SCRATCH my_run_dir

      And fetch it again with:

      # to unpack it back into $SCRATCH/my_run_dir
      tar xvf $ARCHIVE/simulation_01.tar -C $SCRATCH 

 I want to keep a folder on the HPC cluster in sync with a folder on my workstation


  • To replicate on Mercer a data directory you have on your workstation - assuming you are entering these commands on a local terminal on your workstation, and you have an SSH Tunnel set up and running:

    $ hostname
    $ ls -F
    $ rsync -av my_input_data mercer:/scratch/\$USER/my_run_dir

    The host name followed by a colon tells rsync that the (in this case) destination is on another host. If your username on the other host is different to the username on the current host, you can specify the remote username with username@remotehost:
    Note the backslash in \$USER - this instructs the shell not to expand $USER to your local (on your workstation) username. An equivalent command is: 

    $ ls -F
    $ rsync -av my_input_data NetID@mercer:/scratch/NetID/my_run_dir

  • To copy in the other direction, from /scratch on Mercer to your workstation (again, from a local terminal on your workstation and across an SSH Tunnel):

    $ hostname
    $ rsync -av mercer:/scratch/\$USER/my_run_dir my_results
    $ ls my_results

    Only those files not already up-to-date on your workstation will be copied.

Click here for more about using rsync on the NYU HPC clusters 

 How do I transfer files to Dumbo cluster from Windows workstation?

First, download and install WinSCP tool from here. If you are inside NYU network (on campus), simply open WinSCP and fill in all the fields:

If you are outside of NYU network, one option is to set up and use VPN. After that you can use WinSCP as described above. Another option is to  start an SSH tunnel on the workstation. We have instructions on how to do this for Windows workstations. Once your SSH tunnel is set up and started on dumbo, open WinSCP and fill the fields in as shown below :

 How can I share data on Prince with external (non-NYU) collaborators?

Please use the Globus data sharing feature. As long as external collaborators have the Globus access, they can download files from directories you set up for data sharing on prince. The setup procedure is described on this page -

Jupyter Notebooks

 From a Windows computer how to connect to a notebook running inside a Slurm job in the Prince cluster
To get access to the Jupyter notebook from your Windows computer, assuming:
  • You are within NYU network. If not, you may set up VPN to get onto NYU network.
  • In Slurm job stdout file slurm-9999999.out, the instruction is:
    ssh -L 6217:localhost:6217 <netid>

Start a new PuTTY session, then follow the steps below -  
1. Enter hostname '' and port 22 as shown in the above picture on the left.
2. On the left side of the PuTTY session, click 'SSH' then 'Tunnels'. See above picture on the right: enter these as in the red oval, then click 'Add'.
3. Click 'Open' to start the connection.
4. Enter username, password to get onto a Prince login node.
5. Go to your browser, enter the URL as provided in the Slurm stdout file.
  • No labels