The /scratch file system on the NYU HPC clusters is intended for short-term storage of analysis-ready data sets. The /scratch file system is not suitable for storing data for long periods of time.
Files on scratch are NOT backed up
Files on /scratch are NOT backed up. Always backup your important data to /archive. /archive is only available on HPC login nodes, not from compute nodes.
Files under /scratch that have not be accessed for more than 60 days will be automatically deleted. To find out what files will be purged you can run the following command:
Modifying file access times (using "touch" or any other method) for the purpose of circumventing purge policies may result in the loss of access to the cluster.
Even with the policy in place, if the total usage of /scratch remains above 70%, the top /scratch users will be asked to reduce their /scratch usage.
Please do not simply copy the folders with many small files from /scratch to /archive, please try to tar and zip files on /scratch first, copy the tar balls to /archive, then delete the original files in /scratch.
For example, for the folder qmmm-md in /scratch, disk usage is 2.1GB
to tar and zip the folder with command
disk usage for the tar ball file is 1.1GB
Copy the tar ball file to $ARCHIVE
Delete the related files from /scratch
There are also many useful tools for tar ball files, such as zcat, less, zgrep, please Google to find some useful pages
Please do not directly use the linux cp command to copy data from /scratch to /archive on Prince login nodes. This would create heavy load on /archive. Please use the /share/apps/utils/rsync.sh shell script wrapper to copy data from /scratch to /archive. Inside this script, we limit "rsync" to a bandwidth of no more than 20MB/s, also enable the flag -a for archive mode. In some cases when source and destination are not on the same file systems, and the file systems have different block size settings, the directory sizes might look different. This is fine since rsync makes sure data transfers are successful by verifying checksum. The script wrapper usage is similar to the rsync command, e.g.:
Please only back up what you actually need to retain. Backup the minimum needed to reproduce your work.
If you see warnings about set_acl similar to the following example, please ignore them. /scratch file system is Luster with FACL enabled, /archive is NetApp file system. The FACL enabled for files in /scratch can not be kept when synchronize to /archive, that’s the warning information about. This also means when you copy the data back from /archive to /scratch, if you want to share with others, you’ll have to reset FACL properly.