Getting Started on HPC Prince Cluster
Logging in to Prince is the same 2-stage process as logging into to any other NYU HPC cluster. The host name is '
The HPC clusters (Prince and Dumbo) are not directly visible to the internet (outside the NYU Network). If you are outside NYU's Network (off-campus) you must first login to a bastion host named
The diagram below illustrates the login path.
The clusters can still access the internet directly. This may be useful when copying data from servers outside the NYU Network - see: How to copy files to and from the HPC clusters.
Alternatively, instead of login to the bastion host, you can use VPN to get inside NYU's network and access the HPC clusters directly. Instructions on how to install and use the VPN client are available here.
You can't do anything on the bastion host, except
ssh to the HPC clusters.
Compute and Storage Facilities
Prince has 2 login and 385 compute nodes:
HPC Cluster Prince
2 login nodes: prince0 and prince1.hpc.nyu.edu
Each login nodes has 2 Intel Xeon E5-2680v4 2.4GHz CPUs ("Broadwell", 14 cores/socket, a total of 28 cores/login node) and 128 GB memory
Standard Compute Nodes
Nodes equipped with NVIDIA GPUs
Medium Memory Node
High Memory Nodes
|Total Nodes||387 (385 Compute Nodes + 2 Login Nodes)|
|CPU cores ||8928 cores on compute nodes + 56 cores on login nodes|
50 NVIDIA K80 (24GB)
16 NVIDIA GTX 1080 (8GB)
|Total memory||46TB for compute + 256GB for login nodes|
Basic Linux Commands and available file systems
Files Systems for usage:
The NYU HPC clusters have multiple filesystems for users files. Each filesystem is configured differently to serve a different purpose.
(20 GB / user)
All Prince nodes (login, compute)
(5 TB / user)
Files unused for 60 days are deleted
All Prince nodes (login, compute)
(2 TB / user)
Files unused for 60 days are deleted
|All nodes (login, compute)||$BEEGFS||/beegfs/$USER|
(2 TB / user)
Only on login nodes
|/state/partition1||Varies, mostly >100GB||ext3||NO|
at the end of each job
|Separate local filesystem on each compute node||$SLURM_JOBTMP||/state/partition1/$SLURM_JOBID|
Navigating the directory structure on Linux and Basic Commands:
We've already seen
ssh, which takes us from the host we are on to a different host, and
hostname, which tells us which host we are on now. Mostly you'll move around filesystems and directories, which resemble inverted tree structures as shown below schematically:
"pwd" - "print working directory", or "where am I now?".
In Unix, filesystems and directories are arranged in a hierarchy. A forward slash "
/" is the directory separator, and the topmost directory visible to a host is called "
/". Filesystems are also mounted into this directory structure, so you can access everything that is visible on this host by moving around in the directory hierarchy.
You should see something like
- "cd" - To change to a different directory, use "
cd" ("change directory"). You'll need to give it the path to the directory you wish to change into, eg "
cd /scratch/NetID". You can go up one directory with "
If you run "
cd" with no arguments, you will be returned to your home directory.
And if you run "
cd -" you will be returned to the directory you were in most recently.
"mkdir" - To create a new location, use "
"ls" - To see what files are in the current directory, use "
If this is your first time using the HPC cluster, "
ls" probably won't return anything, because you have no files to list.
There are a couple of useful options for ls:
ls -l" lists the directory contents in long format, one file or directory per line, with extra information about who owns the file, how big it is, and what permissions are set.
ls -a" lists hidden files. In Unix, files whose names begins with "
." are hidden. This does not stop anything from using those files, it simply instructs
lsnot to show the files unless the
-aoption is used.
- "rmdir" - To remove a directory, use "rmdir
Hands-on Exercise 1:
Copying, moving or deleting files locally:
Copying: The "
cp" command makes a duplicate copy of files and directories within a cluster or machine. The general usage is "cp source destination":
Copies a duplicate copy of test_file1.txt with new name test_file2.txt
Recursively copy the directory "
That is, a new directory "
Moving: The "mv" command renames files and directories within a cluster or machine. The general usage is "mv source destination":
Renames dummy_file.txt as test_file.txt.
Renames the directory "
The "rm" (remove) command deletes files and optionally directories within a cluster or machine. There is no undelete in Unix. Once it is gone, it is gone.
Remove a file.
If you use
Forcibly remove a file, without asking, regardless of its permissions (provided you own the file)
|Remove subdir if it is already empty. Otherwise, command fails.|
Recursively delete the directory
The head and tail commands:
The command "head filename" displays the first few lines of a text file filename.
The command "tail filename" displays the last few lines of a text file filename.
Hands-on Exercise 2:
How to map DNA Reads
Text Editor (NANO):
"nano" is a friendly text editor which can be used to edit content of existing file or create a new file. Here are some options used in nano editor.
|Ctrl + O||Save the changes|
|Ctrl + X||Exit nano|
|Ctrl + K||Cut single line|
|Ctrl + U||Paste the text|
Softwares and Environment Module
Environment Modules is a tool for managing multiple versions and configurations of software packages, and is used by many HPC centers around the world.
To use a given software package, you load the corresponding module. Unloading the module afterwards cleanly undoes the changes that loading the module made to your environment, thus freeing you to use other software packages that might have conflicted with the first one.
Working with software packages on the NYU HPC clusters.
The command for seeing what software packages are available is:
Finding out more about a software package
A module file may include more detailed help for the software package, which can be seen with:
You can also see exactly what effect loading the module will have with:
You can check which modules are currently loaded in your environment with:
To load a module:
To unload the module:
Unloading all modules - You can remove all loaded modules from your environment with:
NOTE: It's a good idea to use "module purge" before loading modules to ensure you have a consistent environment each time you run.
sbatch Job Submission
In a Linux cluster there are hundreds of computing nodes inter-connected by high speed networks. Linux operating system runs on each of the nodes individually. The resources are shared among many users for their technical or scientific computing purposes. Slurm is a cluster software layer built on top of the inter-connected nodes, aiming at orchestrating the nodes' computing activities, so that the cluster could be viewed as a unified, enhanced and scalable computing system by its users. In NYU HPC clusters the users coming from many departments with various disciplines and subjects, with their own computing projects, impose on us very diverse requirements regarding hardware, software resources and processing parallelism. Users submit jobs, which compete for computing resources. The Slurm software system is a resource manager and a job scheduler, which is designed to allocate resources and schedule jobs. Slurm is an open source software, with a large user community, and has been installed on many top 500 supercomputers.
As an user we interact mostly with sbatch. This hands-on tutorial introduces the usage of its three commands:
- sbatch: submit jobs
- squeue: monitor jobs
- scancel: terminate jobs before completion
Batch job submission can be accomplished with the command sbatch. Like in Torque qsub, we create a bash script to describe our job requirement: what resources we need, what softwares and processing we want to run, memory and CPU requested, and where to send job standard output and error etc. After a job is submitted, Slurm will find the suitable resources, schedule and drive the job execution, and report outcome back to the user. The user can then return to look at the output files.
The squeue command lists jobs which are in a state of either running, or waiting or completing etc. It can also display jobs owned by a specific user or with specific job ID.
Run 'man sinfo' or 'man squeue' to see the explanations for the results.
To remove a queued job from the batch queuing system, or kill a running job, use scancel. To delete all jobs emitted by you, turn on '-u' argument.
A R Job Example
Basic R jobs
Multiple R versions exist in HPC environment. To check what are available, on Prince:
Suppose we want to use 3.3.2, run these commands:
We first clean up the environment by doing 'module purge'. Then we load the R version selected, check what are available in current environment. We can see that R 3.3.2 is indeed loaded along with its dependency modules. Let's try this basic R example. We name it "example.R":
Below is the screen scene while running it on Prince:
What is shown above is a simple demo case on login nodes. For real interactive analysis scenario, users are encouraged to run on compute nodes using the 'srun' command to request dedicated resources, e.g.:
Besides running our analysis interactively, long running and big data crunching jobs ought to be submitted to the batch system slurm. The "example.R" can be submitted to slurm to run in batch mode.
Copy example files to your newly created directory.
Below is how the example looks like:
Then create a sbatch job script as:
Once the sbatch script file is ready, it can be submitted to the job scheduler using sbatch. After successful completion of job, verify output log file for detail output information.