Child pages
  • Transferring files between the HPC Prince Cluster and Google Drive
Skip to end of metadata
Go to start of metadata


 

rclone - rsync for cloud storage, is a command line program to sync files and directories to and from cloud storage systems such as Google Drive, Amazon Drive, S3, B2 etc. rclone can be invoked in one of the three modes:

  • Copy mode to just copy new/changed files
  • Sync (one way) mode to make a directory identical
  • Check mode to check for file hash equality

rclone is available on Prince cluster, the module is rclone/1.35 .

Please try with these options: rclone --transfers=32 --checkers=16 --drive-chunk-size=16384k --drive-upload-cutoff=16384k copy source:sourcepath dest:destpath

This option works great for file sizes 1Gb+ to 250GB. Keep in mind that there is a rate limiting of 2 files/sec for upload into Google drive.  Small file transfers don’t work that well. If you have many small jobs, please tar the parent directory of such folders and splits the tar file into 100GB chunks and uploads then into Google drive.

 

Step 1:

Login to Prince:
$ ssh -Y NetID@prince.hpc.nyu.edu
If necessary please read the wiki page on how to login to HPC clusters.  

Step 2:

First we load the module clone, using command:
$ module load rclone/1.35

Step 3:

Configuring rclone and setting up remote access to your Google drive, using command:
$ rclone config

This will try to open the config files and you will see the below content:
You can select one of the options (here we show how to setup a new remote)

2017/02/24 10:21:00 Config file "/home/ad95/.rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

You enter n for a new remote connection and give it a name.

name> remote1

Then you choose the type of storage for which you are setting up the remote (here we show the method for setting up a remote for google drive which is option 7)

Type of storage to configure.
Choose a number from below, or type in your own value
 1 / Amazon Drive
   \ "amazon cloud drive"
 2 / Amazon S3 (also Dreamhost, Ceph, Minio)
   \ "s3"
 3 / Backblaze B2
   \ "b2"
 4 / Dropbox
   \ "dropbox"
 5 / Encrypt/Decrypt a remote
   \ "crypt"
 6 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
 7 / Google Drive
   \ "drive"
 8 / Hubic
   \ "hubic"
 9 / Local Disk
   \ "local"
10 / Microsoft OneDrive
   \ "onedrive"
11 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
12 / Yandex Disk
   \ "yandex"
Storage> 7

Then you see a few messages like the ones below:

Google Application Client Id - leave blank normally.
client_id> (just press enter key here) 
Google Application Client Secret - leave blank normally.
client_secret> (just press the enter key here)

Now since you are remotely accessing the cluster you have to select remote config i.e. option n

Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine or Y didn't work
y) Yes
n) No
y/n> n

You will see a message similar to the one below:

If your browser doesn't open automatically go to the following link: https://accounts.google.com/o/oauth2/auth?client_id=202264815644.apps.googleusercontent.com&redirect_uri=urn...

Log in and authorize rclone for access.

You have to open this url in your workstation systems browser and authenticate your Google drive options. Once that is done you will get a screen that displays a secret key/ verification code.
You enter this key/ copy key from browser and paste it in the terminal. Once the terminal accepts the verification code it displays the options below: 

y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

You can select y if everything seems okay with the remote or you can edit the same.
You can also view the current existing remotes.

Step 4:

Transferring files to Google drive, using the command below:
$ rclone copy <source_folder> <remote_name>:<name_of_folder_on_gdrive>

It looks something like below:
$ rclone copy /home/user1 remote1:backup_home_user1

Step 5:

The files are transferred and you can find the files on your Google drive.
Note: Rclone only copies new files or files different from the already existing files on Google drive.

 

 

  • No labels