In this issue
outage of $SCRATCH, during which all jobs will be killed
use $SCRATCH, not /scratch/netid/
Directories with thousands of small files take thousands of times longer to back up than a single large file
NYU HPC Wiki: http://wikis.nyu.edu/display/NYUHPC
HPC consultations and walk-in Data Services assistance at level 5, Bobst library.
For more information or to book a consultation contact us at firstname.lastname@example.org.
New $SCRATCH filesystem this January
During January we will be replacing our aging Lustre ($SCRATCH) filesystem with a new, much faster one. The upgrade will involve an outage of $SCRATCH, during which all jobs will be killed.
We will publish the dates and duration of the outage at least a week in advance, please keep this in consideration when planning long jobs and long workflows.
During the outage the old /scratch filesystem will be replaced with the new one. After the outage we will copy all data from the old /scratch to $SCRATCH/old-scratch/ on the new filesystem. This might take a few days, so if you expect to need some scripts and data from /scratch immediately after the outage, we recommend making a copy of those files in $WORK beforehand.
After the upgrade flushing will be re-enabled for files untouched for more than 60 days.
The path for $SCRATCH may change with the upgrade - for portability we strongly recommend that you use $SCRATCH, not /scratch/netid/, in scripts.
Please tar directories before copying to /archive
$ARCHIVE is intended for long-term storage of infrequently-access data, and is regularly copied to tape for safekeeping. The time taken to copy to take depends on the number of files, not the total size of the data. Directories with thousands of small files take thousands of times longer to back up than a single large file. There are notes about using $ARCHIVE on the HPC Wiki.
Skytree Infinity machine learning software on Mercer
NYU is discussing a 10-year partnership with Skytree and they are considering donating their "Skytree Infinity" machine-learning software (http://www.skytree.net/products/skytree-infinity) for use by NYU faculty and students. If you foresee this being useful to your research, you can support the discussion by contacting email@example.com with a one-or-two line summary of the research you have in mind.
Job submission tips
You can get your jobs running sooner on Mercer by requesting only the resources you need - the scheduler can more easily find a place to run a 1 CPU, 4gb memory job than a 20 cpu, 189gb memory job.
One way the scheduler enforces fairness is to limit the number of "outstanding CPU-hours" allocated to the running jobs of any one user. So for example, you can have simultaneously:
- 240 single-CPU, 24 hour jobs (5760 cpu-hours outstanding), or
- 12 20-cpu, 24 hour jobs (5760 cpu-hours outstanding), or
- 6 20-cpu, 48 hour jobs (5760 cpu-hours outstanding), or
- 1 40-cpu, 6-day job (5760 cpu-hours outstanding), or
- 1 240-cpu, 1-day job (5760 cpu-hours outstanding)
After which point, any job you submit will queue until your outstanding-cpu-hours allocated drops below your outstanding-cpu-hours budget by enough to accomodate it.
12 20-cpu jobs (240 CPUs) free 240 cpu-hours of your budget each hour they run, so a 12-hour, 20-cpu job will be eligible to start just 1 hour later. But a 48-hour, 20-cpu job will need to wait 4 hours for space in the budget.
There are more notes about job scheduling and requesting resource on the HPC Wiki.
Happy Holidays! from the HPC team
NYU HPC clusters will continue to operate over the Winter Break, but support will be limited and probably not immediate.
We wish you and your families the best for the holidays and the new year.