CDTools at TACC
Last update: April 11, 2024
Leveraging each node's /tmp
directory space can effectively minimize the I/O load on the global Lustre file system and can also improve the performance of I/O work. Due to its limited size, the /tmp
space is appropriate for executables/binaries, frequently-used object files, and small size common files, e.g. the global configuration files or the initial/pre-processed data files.
Collect-Distribute (CDTools) has been designed and developed to distribute files or directories to or from each compute node's /tmp
directory.
CDTools has two utilities:
distribute.bash
- copy/clone the binaries and frequently accessed input files to the local/tmp
space on each compute node prior to computation/when a job starts,collect.bash
- collect output files and log files back to$WORK
or$SCRATCH
after computation is complete/before a job finishes.
You can employ CD Tools within a job script, or interactively within an idev
session.
Using CD Tools
CDTools is currently installed on TACC's Stampede3, Frontera, and Lonestar6 resources.
1. Initialize CD Tools Environment Variable
Load the CDtools module in your job script or within an idev
session:
$ module load cdtools
2. Distribute Files to Each Node's /tmp
Space
Distribute your files/directories to the local /tmp
space of each compute node allotted for your job:
$ distribute.bash ${SCRATCH}/inputfile #put the full path of your input file here
or
$ distribute.bash ${SCRATCH}/inputdir #put the full path of the directory of your input files here
If you ssh
to those compute nodes after running the above command, you would find an identical copy of your input file or directory in the /tmp
directory on each node.
3. Collect your Output Files
Important
Each node's /tmp
directory is purged once a job ends and before the node is released back into the pool of available nodes.
Collect the job output files from the /tmp
space of each node using the collect.bash
script. Place this at the end of your job script or issue this command at the end of your idev
session.
collect.bash /tmp/outputdir ${SCRATCH}/output_collected
or $ collect.bash /tmp/outputfile ${SCRATCH}/output_collected
You will obtain a list of output files or directories copied back to your target directories in $SCRATCH
. These output files or directories have been appended with an underscore and a number that indicates the rank of compute nodes. For example, given a job run on four nodes: files outputfile_0
, outputfile_1
, "outputfile_2
and "outputfile_3
will all be placed in the "/output_collected
directory.
Sample Job Script
#!/bin/bash
#SBATCH -J testrun # Job name
#SBATCH -o CDtest.%j.out # Name of stdout output file (%j expands to jobId)
#SBATCH -e CDtest.%j.err
#SBATCH -p development # Queue name
#SBATCH -N 2 # Total number of nodes requested
#SBATCH -n 16 # Total number of cores requested
#SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 5.0 hours
#SBATCH -A P-1234567 # <-- Allocation name to charge job against
# 0. Preparation
module load cdtools
# 1. Run distribute: Distribute input files and directories
# to /tmp on each compute node.
# Distribute your programs/binaries if necessary.
distribute.bash ${SCRATCH}/datafiles/inputfile
distribute.bash ${SCRATCH}/datafiles/inputdir
wait
# 2. Run your application here!
ibrun ./myapp
wait
# 3. Run collect: Collect output files and directories from /tmp.
# All importnat data files must be archived
# to $WORK or $SCRATCH before the job finishes!
collect.bash /tmp/outputdir ${SCRATCH}/datafiles/new_output_collected
Notes
- This tool should work for both batch mode and interactive mode.
- Always test your workflow with CDTools before any substantial productions runs to ensure required files are successfully distributed and collected.
- Users should still understand and respect the
/tmp
limit and other I/O rules.