Child pages
  • Which filesystems should I use?
Skip to end of metadata
Go to start of metadata
The NYU HPC clusters have five filesystems for users' files. Each filesystem is configured differently to serve a different purpose:

 

Space

Environment
Variable

Space Purpose

Visibility

Backed up?

Flushed?

Allocation

Cost for Additional
Storage

Total Size

File System

/home
$HOME

Program development space; storing small files you
want to keep long term , e.g. source code, scripts.

login and compute nodes.

Starting with the installation of Mercer we have a unified /home filesystem served from same 7420 storage system as /archive and /work

Yes

ASCII filenames only 

No

20GB (unified /home, mounted on Mercer)

 

N/A

600TB (unified /home, space shared with /archive and /work)

NFS

ZFS

/archive
$ARCHIVE

Long-term storage, mounted only on login nodes.

Best for large files, please tar collections of small files when archiving.

Groups may request a common aggregate archive space.

login nodes only.

Common to all clusters.

Yes

ASCII filenames only

No

2TB

$500/year for 1TB

600TB

shared with /work and unified /home

ZFS

/scratch
$SCRATCH

Computational work space. Best suited to large, infrequent reads and writes.

Files are deleted after 60 days without use.

login and compute nodes.

Common to all clusters.

No

Files not accessed
for 60 days

5TB;
inode quota: 1 million
Policy

N/A

410TB

Lustre

/work
$WORK

Medium term, non-backed up storage mounted on login and compute nodes.

login and compute nodes.

NoNo500GBN/A600TB
shared with /archive and unified /home
ZFS
/state/partition1
$PBS_JOBTMP

Small, node-local filesystem cleaned up at the end of each Torque job. For small, frequent reads and writes.

Environment variable is defined in batch jobs (via qsub wrapper)

compute nodes only. Local to each compute node.

NoEnd of each jobVaries. Generally >100GBN/AVariesext3
 $PBS_MEMDISKOptional, node-local memory filesystem. Like $PBS_JOBTMP but smaller and faster. See here for usage.compute nodes only. Local to each compute node.NoEnd of each jobDefault 8GB. Specific amount can be requested (but must fit within node memory)N/AVariestmpfs or ramfs

 Only files and directories with ASCII-only filenames are backed up. Our backup system does not handle unicode in file or directory names, such files and directories (including all files and directories under them) will be bypassed.

Important: Of all the space, only /scratch should be used for computational purposes. Please do not write to /home when running jobs as it can easily be filled up.

*Note:  Capacity of the /home file system varies from cluster to cluster. Unlike /scratch and /archive, the /home file system is not mounted across clusters. Each cluster has its own /home, its own user base and /home allocation policy.   

To purchase additional storage, send email to hpc@nyu.edu.

Using /work

Users now have a default quota of 500GB on /work, which is NFS mounted on the Mercer login and compute nodes.

/work is intended as a medium-term archive space which is visible to the compute nodes - jobs are still best run in /scratch, and large I/O during a run should go to /scratch. /work is mounted on the compute nodes via NFS, which has higher latency and lower bandwidth than the Infiniband connection to /scratch. Files on /work are not flushed, but neither are they backed up.

Using fast local disk

Each node has some local disk. When you start a job, the environment variable $PBS_JOBTMP points to a directory on the local disk which your job can use. This is particularly suitable for jobs which write many small or temporary files, as these typically do not perform well on shared parallel filesystems.

All files in $PBS_JOBTMP are deleted at the end of the job.

Any files you need to keep should be copied to $HOME, $WORK or $SCRATCH before the end of the job

There is one local disk per physical node, so if you use "-l nodes=3:ppn=4", and your job is run on:

  • 4 cores of one node (say, compute-1-1)
  • another 4 cores on that same physical node, and 
  • 4 cores on a different node (say, compute-2-10)

then $PBS_JOBTMP will be the same for the first two "nodes" and different for the third one. (Because the first two "nodes" share a physical node)

You can also request a part of the RAM on a node to be presented as a disk. This is like using local disk, but faster still - however, it occupies memory on the node.

 To do this, specify:

This will reserve some memory and present it is a local disk, which you can access with the environment variable $PBS_MEMDISK.

In the second line, a memory disk of 20GB is created. In the first line, a memory disk of "some nominal amount" is created - normally 8gb. If you have exclusive use of a node and request memdisk with no specific size, the memdisk will grow as you write to it, until the job plus the memdisk have entirely filled the node's memory. This gives flexibility for when you don't know how much memdisk you need, but may result in the job (or the node) crashing if you use too much memory plus memdisk.

If you did not request memdisk, $PBS_MEMDISK will point to the local disk space at $PBS_JOBTMP.

See Managing Data for information about using rsync to synchronize directories.

Working with NYU HPC filesystems

So what does a job look like on NYU HPC, taking into consideration the filesystems?

  • $HOME is a good place to keep scripts and code
  • $WORK is a good place to keep a copy of data you will be using over the course of several months. You should have a copy of this somewhere else too, as it is not backed up
  • $SCRATCH is for active data - this is where your jobs should run. It isn't backup up, and is periodically flushed, so make sure you have another copy of anything important.
  • $ARCHIVE is for longer-term backup of data you are not frequently using

Below is an annotated example of how a job can use these filesystems:

There is more about using rsync at Keeping directories in sync with rsync

 

 

  • No labels