Skip to end of metadata
Go to start of metadata

The table below shows the File Systems available on the Prince Cluster.

Mountpoint

Storage Capacity

(User Quota)

FS Type

Backed up?

Flushed?

Availability

Variable

Value

/home

43 TB

(20 GB / user)

ZFS

Yes

No

All Prince nodes (login, compute)

$HOME

/home/$USER

/scratch

1.1 PB

(5 TB / user)

Lustre

NO

 

YES

Files unused for 60 days are deleted

All Prince nodes (login, compute)

$SCRATCH

/scratch/$USER

/beegfs

500TB

(2 TB / user)

BeeGFSNO

YES

Files unused for 60 days are deleted

All nodes (login, compute)
$BEEGFS/beegfs/$USER

/archive

700 TB

(2 TB / user)

ZFS

Yes

No

Only on login nodes

$ARCHIVE

/archive/$USER

/state/partition1Varies, mostly >100GBext3NO

YES

at the end of each job

Separate local filesystem on each compute node$SLURM_JOBTMP/state/partition1/$SLURM_JOBID

Moving files across filesystems

To move a file to a different folder on the same filesystem, mv is best as it need not copy the file contents. However, when moving files from one filesystem to another it is best to use rsync, not mv. The file contents will need to be copied regardless, and if mv fails it can damage both the original and the copy. You can avoid this risk by using rsync to ensure the file is correctly copied before deleting it from the original location.

See How to copy files to and from the HPC clusters for some tips about using rsync.

How to use /scratch

The /scratch filesystem is configured for large-block I/O, such as sequential reading and writing of large files. However individual I/O operations are relatively costly, so programs using frequent, small I/O accesses will put a heavy load on the metadata servers, which in extreme cases can cause the system to become unstable. The system administrators generally detect this quickly and may kill a job whose I/O characteristics are are stretching the capabilities of the filesystem (if this happens, we will contact you to help configure your job for better stability and performance).

If frequent I/O is unavoidable, we recommend using the node-local, temporary filesystem on /state/partition1 for those files. Note that this filesystem is cleaned up at the end of each job, so your job will need to copy any files to and from it before exiting.

The accessibility of filesystems from the login and compute nodes is illustrated in the following diagram:

Filesystems generally, and Lustre (/scratch) especially, do not cope well with large numbers of files in a single directory - operations using that directory will be slow and impact /scratch performance for other users. Please limit the number of files or folders in a single directory to about 1000 - if you need more than this, it is best to split the directory over multiple folders.

Managing large numbers of files

Filesystems generally - and high-performance filesystems such as Lustre especially - perform best with a small to moderate number of medium to large files. Some specific issues to be aware of are:

On $SCRATCH

  • Lustre ($SCRATCH) gets performance mainly by striping distributing a large file across several disks and several "object storage servers". File metadata operations, on the other hand, do not have much parallelism available. So a few large read or write operations is vastly faster than many small reads or writes. This is true for reads and writes within a single files as well as for reads or writes on many files.
    • If your job does many small I/O operations, it might be better to copy the file to $PBS_JOBTMP or $PBS_MEMDISK at the start of the job, and open the local copy of the file.
    • (But for large reads and writes, $SCRATCH is likely to be faster than local disk)

  • The default stripe count on $SCRATCH is 4, so each file is striped across disks on 4 object storage servers. If you have a folder filled with files each smaller than 1MB, it is better not to stripe them. You can set the stripe count on a folder (under $SCRATCH) to 1 with:

  • Finding a file within a folder is a serial operation. And the more files in a folder, the longer it takes. With several thousand files, even 'ls' on that folder can take several minutes and may affect responsiveness of the filesystem for other users.
    • If you have more than about 1000 files in one folder, distribute them over a number of subfolders. The best performance will be when the number of subfolders is the square root of the total number of files (eg, for 1 million files, 1000 subfolders each containing 1000 files)

On $ARCHIVE

  • The backup regime on /archive is optimized for small numbers of large files - 1000 files of 1kb each take 1000 times as long to backup as 1 file of 1MB! Too many small files can prevent the backup from completing in time
    • when archiving a collection of small files, please tar the files first. You can send a collection of files to /archive with the command:

      And fetch it again with:

/scratch Policy

The /scratch storage system is a shared resource that needs to run as efficiently as possible for the benefit of all.  All HPC account holders have a /scratch disk space quota of 5TB and inode quota of 1 million. There is no system backup for data in /scratch, it is the user's responsibility to back up data. We cannot recover any data in /scratch, including files lost to system crashes or hardware failure so it is important to make copies of your important data regularly.

  • All inactive files older than 60 days will be removed.  It is a policy violation to use scripts to change the file access time. Any user found to be violating this policy will have their user's HPC account locked. A second violation may result in your HPC account being turned off.  
  • We strongly urge users to regularly clean up their data in /scratch to decrease /scratch usage by backing up files you need to retain either on /archive or elsewhere. 
  • All users will be asked to do cleanup if total /scratch usage is above 75%, which will decrease the scratch file system performance.
  • We retain the right to clean up files on /scratch at any time if it is needed to improve system performance.

There are some recommendations:

  • Do not put important source code, scripts, libraries, executables in /scratch. These important files should be stored in /home.
  • Do not make soft link for the folders in /scratch to /home for /scratch access
  • We strongly suggest users work with big size files, instead of many small size files.
  • /scratch is optimized for infrequent, large reads and writes. For frequently accessed temporary files during job running process, please use local disk in the compute node or even RAM file system on the compute node to decrease IO load to /scratch file system.

How to use /archive

The $ARCHIVE filesystem is intended for longer-term storage of simulation results.

The backup system used for $ARCHIVE can only handle ASCII characters in filenames, if your filename has non-ASCII characters then it will not be backed up.

 

  • The backup regime on /archive is optimized for small numbers of large files - when archiving a collection of small files, please tar the files first. You can send a collection of files to /archive with the command:

And fetch it again with:

 

  • You can extract just a portion of the tar file:

Which will create a directory $SCRATCH/my_run_dir/subdir1 and put the tarred contents of subdir1 into it

 

  • To extract my_run_dir/subdir1 directly into scratch (without the my_run_dir/ prefix), use --strip-components=<n>

 

  • Finally, to see the contents of a tar file:

 

 

Group Quotas on /archive

HPC accounts include an allocation of 2 TB of storage in /archive. An HPC sponsor may request that his/her quota and the quota of his/her research group be combined to create a single larger space. Some conditions:

  • Requests must be made by the sponsor
  • All of the members of the group must share the same sponsor
  • All group members must be active users of the HPC system

The sponsor's account will hold the full quota and each individual's quota will be set to 0.

Requests will be considered by HPC management and assessed by evaluating the need for it and availability.

Maximum size of group quota is 10 TB. Additional storage can be added for $500/TB/year (based on availability)

To apply for a group quota please use the form at this link.   You will receive a response to your request within 1 week.

Extra archive space on /work

 (more..)

Automatic File Deletion Policy

The table below describes the policy concerning the automatic deletion of files.

Space

Automatic File Deletion Policy

/home

none

/archive

none

/scratch

Files may be deleted as needed without warning if required for system productivity.

/worknone

ALL

ALL /home and /archive files associated with expired accounts will be automatically deleted 90 days after account expiration.
/scratch files will automatically be deleted no later than 30 days after account expiration. 

Recovering files from backup

This page is retained from an earlier version of the HPC wiki only for reference.


 

  • No labels