HPC-archive (found at /archive) is NYU HPC's permanent storage for research data.
On the Internet, a bastion host is the only host computer that an organization allows to be addressed directly from the public network and that is designed to screen the rest of its network from security exposure. The NYU HPC bastion host is hpc.nyu.edu and with an account can be accessed by logging in using ssh with your NetID, e.g. ssh NetID@hpc.nyu.edu.
A group of servers and other resources that act like a single system and enable high availability and, in some cases, load balancing and parallel processing.
The computational units that batch or interactive jobs are processed on. While accessible via interactive jobs, compute nodes are not meant to be accessed directly by users. In the article "High Performance Computing Rocks at NYU," published in the Spring 2004 issue of NYU ITS's Connect magazine, Will Wilson, Yingkai Zhang, and David Ackerman write:
"The compute nodes are the workhorses of the cluster. The CPU-intensive calculations researchers submit are run on the compute nodes." At NYU HPC, compute nodes are used for interactive sessions and for running batch jobs via the scheduler. User access to compute nodes is available only via the scheduler process.
Floating-point Operations Per second. 1 Teraflop = 1 Trillion floating-point operations per second. Floating-point is, according to IBM, "a method of encoding real numbers within the limits of finite precision available on computers." Using floating-point encoding, extremely long numbers can be handled relatively easily. A floating-point number is expressed as a basic number or mantissa, an exponent, and a number base or radix (which is often assumed). The number base is usually 10 but may also be 2. Floating-point operations require computers with floating-point registers. The computation of floating-point numbers is often required in scientific or real-time processing applications and FLOPS is a common measure for any computer that runs these applications.
Stand for GNU's Not Unix! The GNU Project was launched in 1984 to develop a complete Unix-like operating system which is free. GNU is used with the kernel Linux. The combination of GNU and Linux is the GNU/Linux, which according to GNU is incorrectly called Linux. GNU makes a variety of software, compilers, and routines available to the public for free. See GNU.org
An inode is a data structure on a traditional unix-style file system. An inode stores basic information about a regular file, directory, or other file system object.
On high performance computing clusters login nodes serve multiple functions. From a login node you can submit and monitor batch jobs, analyze computational results, run editors, plots, debuggers, compilers, do housekeeping chores as adjust shell settings, copy files and in general manage your account. At NYU HPC, login nodes are used for file editing, data transfer, light compiling and debugging, and initiating batch and interactive sessions via the scheduler.
Moab is a scheduler developed by Cluster Resources Inc. (now Adaptive Computing), which allocates resources for jobs that are requesting resources. It does so by collecting all the information that Torque (or other resource managers) can provide about currently running jobs, available nodes and other resources. After scheduled resources for a job, Moab instructs Torque to execute the job on the allocated resources.
NYU HPC uses an open source software package called "Environment Modules," (Modules for short) which allows you to add various path definitions to your shell environment.
Message Passing Interface (MPI) is a specification for an API that allows many computers to communicate with one another. It is used in computer clusters and supercomputers. MPI is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation."
Often used interchangeably with Torque, PBS or OpenPBS is an open source version of the PBS resource manager, currently maintained and offered by Adaptive Computing. In the HPC user community the nomenclature remained tied to PBS - hence the description a "PBS script," which would be really a Torque script today.
PuTTY is a client program for the SSH, Telnet and Rlogin network protocols. These protocols are all used to run a remote session on a computer, over a network. PuTTY implements the client end of that session: the end at which the session is displayed, rather than the end at which it runs. You can download it from http://www.nyu.edu/its/software
PBS/Torque queues, or "classes," as Moab refers to them, represent groups of computing resources with specific parameters. A queue with a 12 hour runtime or "walltime" would allow jobs requesting 12 hours or less to use this queue.
Restart files consist of all the necessary information required to continue a process run from a job¿s termination point. Their main purpose is to divide large, long-running jobs into segments so that if a job is interrupted because it has exceeded its walltime, it can be started again from the point where it stopped. Restart files can also be used to minimize the amount of time, effort and data lost when a long software process, such as a simulation, is interrupted by a hardware or software failure, or resource unavailability. NYU HPC strongly recommends that you develop restart files for simulation packages that require a long walltime. Make sure they are tested and ready for use before running such jobs.
/scratch is a traditional file system used on high performance computing clusters that allows computational jobs to read input from and write permanent or temporary output to. /scratch file systems are most often presented by a parallel file system and mounted to all compute and login nodes of clusters. Parallel file systems, such as Lustre or GPFS, offer high performance and can scale to thousands of nodes. Use /scratch when, for example you are downloading and uncompressing applications, reading and writing input/output data during a batch job, or when you work with significantly large datasets.
Secure Copy is a protocol to copy files between distinct machines. SCP or scp is used extensively on HPC clusters to stage in data from outside resources.
Secure File Transfer Protocol or sftp is used to copy files between distinct machines.
Secure Shell (SSH), sometimes known as Secure Socket Shell, is a Unix-based command interface and protocol for securely getting access to a remote computer. It is widely used by network administrators to control Web and other kinds of servers remotely. SSH is actually a suite of three utilities - slogin, ssh, and scp - that are secure versions of the earlier UNIX utilities, rlogin, rsh, and rcp. SSH commands are encrypted and secure in several ways. Both ends of the client/server connection are authenticated using a digital certificate, and passwords are protected by encryption.
OpenSSH is a network connectivity tool which encrypts all traffic including passwords to effectively eliminate eavesdropping, connection hijacking, and other network-level attacks. SSH-keys are part of the OpenSSH bundle. On NYU HPC clusters, ssh-keys allow password-less access between compute nodes while running batch or interactive parallel jobs.
A measure of a computer's speed and can be expressed as:
- A trillion floating point operations per second
- 10 to the 12th (1012) power floating-point operations per second
- 2 to the 40th power flops
The term HPC generally applies to systems with speeds over a teraflop. Also see the definition for FLOP in this Glossary.
Torque is an Open Source resource manager which is responsible for collecting status and health information from compute nodes and keeping track of jobs running in the system. It is also responsible for spawning the actual executables that are associated with a job, e.g., running the executable on the corresponding compute node. Client commands for submitting and managing jobs can be installed on any host, but in general are installed and used from the Login nodes.
Walltime is the length of time specified in the PBS script for which the job will run on a batch system.
A secure Windows transfer protocol used to copy files to remote systems. You can download it from http://www.nyu.edu/its/software
The X Window System (commonly X or X11) is a software system and network protocol that provides a graphical user interface (GUI) for networked computers. To display graphical output originating from NYU HPC systems you will need a desktop X client.