Child pages
  • Scratch area cleanup

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

The /scratch file system on the NYU HPC clusters is intended for short-term storage of analysis-ready data sets.  The /scratch file system is not suitable for storing data for long periods of time.
titleFiles on scratch are NOT backed up

Files on /scratch are NOT backed up. Always backup your important data to /archive. /archive is only available on HPC login nodes, not from compute nodes.


Files under /scratch that have not be accessed for more than 60 days will be automatically deleted. To find out what files will be purged you can run the following command:


Code Block
$ lfs find /scratch/$USER -atime +60 | xargs ls -l --time=atime --sort=time
Modifying file access times (using "touch" or any other method) for the purpose of circumventing purge policies may result in the loss of access to the cluster.

Even with the policy in place, if the total usage of /scratch remains above 70%, the top /scratch users will be asked to reduce their /scratch usage.

Please do not simply copy the folders with many small files from /scratch to /archive, please try to tar and zip files on /scratch first, copy the tar balls to /archive, then delete the original files in /scratch.

For example, for the folder qmmm-md in /scratch, disk usage is 2.1GB

Code Block
[sw77@log-0 sw77]$ pwd
[sw77@log-0 sw77]$ du -sh qmmm-md
2.1G    qmmm-md

to tar and zip the folder with command

Code Block
$ tar -zcvf qmmm-md.tar.gz qmmm-md

disk usage for the tar ball file is 1.1GB

Code Block
[sw77@log-0 sw77]$ du -sh qmmm-md.tar.gz 
1.1G    qmmm-md.tar.gz

Copy the tar ball file to $ARCHIVE

Code Block
$ rsync qmmm-md.tar.gz $ARCHIVE

Delete the related files from /scratch

Code Block
$ rm -rf qmmm-md qmmm-md.tar.gz 


There are also many useful tools for tar ball files, such as zcat, less, zgrep, please Google to find some useful pages

GNU tar manulas

How do I Compress a Whole Linux or UNIX Directory?

How can I view the contents of tar.gz file without extracting from the command-line?

Please do not directly use the linux cp command to copy data from /scratch to /archive on Prince login nodes. This would create heavy load on /archive. Please use the /share/apps/utils/ shell script wrapper to copy data from /scratch to /archive. Inside this script, we limit "rsync" to a bandwidth of no more than 20MB/s, also enable the flag -a for archive mode. In some cases when source and destination are not on the same file systems, and the file systems have different block size settings, the directory sizes might look different. This is fine since rsync makes sure data transfers are successful by verifying checksum. The script wrapper usage is similar to the rsync command, e.g.:

Code Block
$ /share/apps/utils/ /scratch/work/sw77/CH3ClCl $ARCHIVE/zzz/

Please only back up what you actually need to retain. Backup the minimum needed to reproduce your work.

titleACL Warning

If you see warnings about set_acl similar to the following example, please ignore them.  /scratch file system is Luster with FACL enabled, /archive is NetApp file system. The FACL enabled for files in /scratch can not be kept when synchronize to /archive, that’s the warning information about. This also means when you copy the data back from /archive to /scratch, if you want to share with others, you’ll have to reset FACL properly.

Code Block
rsync: set_acl: sys_acl_set_file(ConsumerSearch, ACL_TYPE_DEFAULT): Operation not supported (95)
sent 269 bytes  received 16 bytes  570.00 bytes/sec
total size is 23280  speedup is 81.68
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]