Data Transfer

Last update: March 4, 2025

This guide will outline and instruct methods of transferring data between TACC resources and and your local machine. Transfer methods generally fall into two categories:

  1. Graphical User Interface (GUI) tools, e.g. Globus, Cyberduck.
  2. Command-line (CLI) tools e.g. scp, sftp, rsync

Important

Connection to third-party storage services, e.g. UTBox, DropBox; is not supported.

Cyberduck

TACC staff recommends the open-source Cyberduck utility for both Windows and Mac users that do not already have a preferred tool.

Cyberduck is a free graphical user interface for data transfer and is an alternative to using the command line. With a drag-and-drop interface, it is easy to transfer a file from your local system to the remote secure system. You can use Cyberduck for Windows or macOS.

cyberduck logo Download Cyberduck

Click on the "Open Connection" button in the top right corner of the Cyberduck window to open a connection configuration window (as shown below) transfer mechanism, and type in the server name "frontera.tacc.utexas.edu". Add your username and password in the spaces provided, and if the "more options" area is not shown click the small triangle or button to expand the window; this will allow you to enter the path to your project area so that when Cyberduck opens the connection you will immediately see your data. Then click the "Connect" button to open your connection.

Once installed, click "Open Connection" in the top left corner of your Cyberduck window.

Once connected, you can navigate through your remote file hierarchy using familiar graphical navigation techniques. You may also drag-and-drop files into and out of the Cyberduck window to transfer files to and from Frontera.

To set up a connection, type in the server name, host. Add your TACC username and password in the spaces provided. If the "More Options" area is not shown, click the small triangle button to expand the window; this will allow you to enter the path to your transfer directory, /transfer/directory/path, so that when Cyberduck opens the connection you will immediately be in your individualized transfer directory on the system. Click the "Connect" button to open your connection.

Cyberduck-SSH
Figure 2. Cyberduck connection setup screen

Note

You will be prompted to "allow unknown fingerprint" upon connection. Select "allow" and then enter your TACC token value.

Once connected, you can navigate through your remote file hierarchy using the graphical user interface. You may also drag-and-drop files from your local computer into the Cyberduck window to transfer files to the system.

Globus Data Transfer Guide

Globus supplies high speed, reliable, asynchronous transfers to the portal. Globus is fast, for large volumes of data, as it uses multiple network sockets simultaneously to transfer data. It is reliable for large numbers of directories and files, as it can automatically fail and restart itself, and will only notify you when the transfers are completed successfully.

This document leads you through the steps required to set up Globus to use for the first time. Several steps will need to be repeated each time you set up a new computer to use Globus for the portal. Once you are set up, you can use Globus not only for transfers to and from the portal, but also to access other cyberinfrastructure resources at TACC and around the world.

To start using Globus, you need to do two things: Generate a unique identifier, an ePPN*, for all Globus services, and enroll the machine you are transferring data to/from with Globus. This can be your personal laptop or desktop, or a server to which you have access. Follow this one-time process to set up the Globus file transfer capability.

Note

Globus Transition. Globus has transitioned to version 5.4. This transition impacts all TACC researchers who use Globus and requires you to update your profile with an ePPN to continue using the Globus service. The use of "Distinguished Names", or DNs, is no longer supported.

Important

You must use your institution's credentials and not your personal email account (e.g. Google, Yahoo!, AOL) when setting up Globus. You will encounter problems with the transfer endpoints (e.g. Frontera, Stampede3, Corral, Ranch) if you use your personal account information.

Step 1. Retrieve your Unique ePPN.

Login to CILogon and click on "User Attributes". Make note of your ePPN.

Figure 4. Make note of your ePPN

Step 2. Associate your EPPN with your TACC Account.

Login to the TACC Accounts Portal, click "Account Information" in the left-hand menu, then add or edit your ePPN from Step 1.

Important

The institution (ePPN) listed in your TACC account profile, must map to the ePPN you are using to log into GlobusOnline.

Figure 5. Update your TACC user profile.

Tip

Once you update your ePPN, please allow up to 2 hours for the changes to propagate across TACC systems.

Step 3. Globus File Manager

Once you've completed these steps, you will be able to use the Globus File Manager as usual. If you encounter any issues, please submit a support ticket.

SSH Command-Line Tools

Transfer files between TACC HPC resources and other Linux-based systems using either scp or rsync. Both scp and rsync are available in the Mac Terminal app. Windows SSH clients typically include scp-based file transfer capabilities.

The scp and rsync commands are standard UNIX data transfer mechanisms used to transfer moderate size files and data collections between systems. These applications use a single thread to transfer each file one at a time. The scp and rsync utilities are typically the best methods when transferring Gigabytes of data. For larger data transfers, parallel data transfer mechanisms, e.g., Grid Community Toolkit, can often improve total throughput and reliability.

Note

It is possible to use these command line tools if your local machine runs Windows, but you will need to use a ssh client (ex. CyberDuck).

To simplify the data transfer process, we recommend that Windows users follow the How to Transfer Data with Cyberduck guide as detailed below.

Using scp

The Linux scp (secure copy) utility is a component of the OpenSSH suite. Assuming your Lonestar6 username is bjones, a simple scp transfer that pushes a file named myfile from your local Linux system to Lonestar6 $HOME would look like this:

localhost$ scp ./myfile bjones@ls6.tacc.utexas.edu:  # note colon at end of line

You can use wildcards, but you need to be careful about when and where you want wildcard expansion to occur. For example, to push all files ending in .txt from the current directory on your local machine to /work/01234/bjones/scripts on Lonestar6:

localhost$ scp *.txt bjones@ls6.tacc.utexas.edu:/work/01234/bjones/ls6

To delay wildcard expansion until reaching Lonestar6, use a backslash (\) as an escape character before the wildcard. For example, to pull all files ending in .txt from /work/01234/bjones/scripts on Lonestar6 to the current directory on your local system:

localhost$ scp bjones@ls6.tacc.utexas.edu:/work/01234/bjones/ls6/\*.txt .

Note

Using scp with wildcard expansion on the remote host is unreliable. Specify absolute paths wherever possible.

You can of course use shell or environment variables in your calls to scp. For example:

localhost$ destdir="/work/01234/bjones/ls6/data"
localhost$ scp ./myfile bjones@ls6.tacc.utexas.edu:$destdir

You can also issue scp commands on your local client that use Lonestar6 environment variables like $HOME, $WORK, and $SCRATCH. To do so, use a backslash (\) as an escape character before the $; this ensures that expansion occurs after establishing the connection to Lonestar6:

localhost$ scp ./myfile bjones@ls6.tacc.utexas.edu:\$SCRATCH/data   # Note backslash

Avoid using scp for recursive transfers of directories that contain nested directories of many small files:

localhost$ scp -r ./mydata     bjones@ls6.tacc.utexas.edu:\$SCRATCH  # DON'T DO THIS

Instead, use tar to create an archive of the directory, then transfer the directory as a single file:

localhost$ tar cvf ./mydata.tar mydata                                  # create archive
localhost$ scp     ./mydata.tar bjones@ls6.tacc.utexas.edu:\$WORK  # transfer archive

Consult the scp man pages for more information:

login1$ man scp

Transferring Files with rsync

The rsync (remote synchronization) utility is another way to keep your data up to date. In contrast to scp, rsync transfers only the actual changed parts of a file (instead of transferring an entire file). Hence, this selective method of data transfer can be much more efficient than scp. The following example demonstrates usage of the rsync command for transferring a file named myfile.c from its current location on Stampede to Frontera's $DATA directory.

localhost$ rsync       mybigfile bjones@frontera.tacc.utexas.edu:\$WORK/data
localhost$ rsync -avtr mybigdir  bjones@frontera.tacc.utexas.edu:\$WORK/data

The options on the second transfer are typical and appropriate when synching a directory: this is a recursive update (-r) with verbose (-v) feedback; the synchronization preserves **time stamps (-t) as well as symbolic links and other meta-data (-a). Because rsync only transfers changes, recursive updates with rsync may be less demanding than an equivalent recursive transfer with scp.

Tip

See Good Conduct for additional important advice about striping the receiving directory when transferring large files; watching your quota on $HOME and $WORK; and limiting the number of simultaneous transfers.

Tip

Remember that $STOCKYARD (and your $WORK directory on each TACC resource) is available from several other TACC systems: there's no need for scp when both the source and destination involve subdirectories of $STOCKYARD.

login1$ rsync myfile.c \
TACC-username@frontera.tacc.utexas.edu:/data/01698/TACC-username/data

An entire directory can be transferred from source to destination by using rsync as well. For directory transfers the options -avtr will transfer the files recursively (-r option) along with the modification times (-t option) and in the archive mode (-a option) to preserve symbolic links, devices, attributes, permissions, ownerships, etc. The -v option (verbose) increases the amount of information displayed during any transfer. The following example demonstrates the usage of the -avtr options for transferring a directory named gauss from the present working directory on Stampede to a directory named data in the $WORK file system on Frontera.

login1$ rsync -avtr ./gauss \
TACC-username@frontera.tacc.utexas.edu:/data/01698/TACC-username/data

For more rsync options and command details, run the command rsync -h or:

login1$ man rsync

Warning

When executing multiple instantiations of any of the commands listed above, scp, sftp and rsync, limit your active transfers to no more than 2-3 processes at a time.

Transfer with sftp

sftp is a file transfer program that allows you to interactively navigate between your local file system and the remote secure system. To transfer a file (ex. my_file.txt) to the remote secure system via sftp, open a terminal on your local computer and navigate to the path where your data file is located.

On Maclocalhost$ cd ~/Documents/portal-data/
On Windowslocalhost$ cd %HOMEPATH%\Documents\portal-data\

For example, assume your TACC username is bjones and you have an allocation on Stampede3. An sftp transfer that pushes my_file.txt from the current directory of your local computer to your home directory on TACC's Stampede3 system would look like this:

localhost$ sftp bjones@stampede3.tacc.utexas.edu:/transfer/directory/path
Password:
TACC Token Code:
Connected to host.
Changing to:
  /transfer/directory/path
sftp>

Once you've logged into the remote secure system and have been redirected to your transfer directory, confirm your location on the server:

sftp> pwd
Remote working directory:
/transfer/directory/path

To list the files currently in your transfer directory:

sftp> ls
utaustin_dir.txt

To list the files currently in your local directory:

sftp> lls
my_file.txt

Tip

The leading l in the lls command denotes that you are listing the contents of your local working directory.

To transfer my_file.txt from your local computer to your transfer directory:

sftp> put my_file.txt
Uploading my_file.txt to /transfer/directory/path
my_file.txt     100% ##  #.#          KB/s   ##:#

To double-check if the transfer succeeded:

sftp> ls
my_file.txt
utaustin_dir.txt

To exit out of sftp on the terminal:

sftp> bye
localhost1$

UTBox and other Third-Party Storage Services

Unfortunately TACC does not allow direct access from UT Box or other third-party storage services such as Dropbox, Google or Amazon storage services. To transfer files from one of these services:

  1. Manually download the files from one of these services to your laptop
  2. Using one of the tools outlined in this document (e.g. scp or Cyberduck), upload the files from your laptop to the desired TACC resource (e.g. Stampede3, Frontera).

If you have files stored at another university, see the Globus instructions above.