AlphaFold3 at TACC

Last update: April 8, 2025

AlphaFold logo AlphaFold3 is Google Deepmind's latest deep learning model for predicting the structure and interactions of biological macromolecules, including proteins, nucleic acids, small molecules, ions, and post-translational modifications. AlphaFold3 significantly expands the capabilities of AlphaFold2, offering highly accurate complex structure predictions beyond protein folding alone. In November 2024, the developers made the source code available on Github and published a Nature paper (supplementary information) describing the method. In addition to the software, AlphaFold3 depends on ~253 GB of databases and model parameters. Researchers interested in making protein structure predictions with AlphaFold3 are encouraged to follow the guide below, and use the databases that have been prepared.

Installations at TACC

Important

To run AlphaFold3 on TACC Systems, you must obtain the model parameters directly from Google by completing this form.

See Google's AlphaFold 3 Model Parameters Terms of Use

Table 1. Installations at TACC

HPC Resource Versions
Lonestar6 AlphaFold3: v3.0.1
Data: /scratch/tacc/apps/bio/alphafold3/3.0.1/data
Examples: /scratch/tacc/apps/bio/alphafold3/3.0.1/examples
Module: /scratch/tacc/apps/bio/alphafold3/modulefiles
Frontera AlphaFold3: v3.0.1
Coming soon
Stampede3 AlphaFold3: v3.0.1
Coming soon
Vista AlphaFold3: v3.0.1
Coming soon

Access

Due to AlphaFold's licensing restrictions, users must obtain the model parameters directly from Google DeepMind. To obtain the model parameters:

  1. Visit the following form: AlphaFold3 Model Request Form
  2. Submit the request using an institutional email address.
  3. Once approved, you will receive instructions for downloading a folder containing the model paramters.

After downloading, you must manually place the model parameters in the appropriate directory in your work environment.

Note: TACC cannot distribute the AlphaFold3 model weights.

Running AlphaFold3

Important

AlphaFold3 is being tested for performance and I/O efficiency - the instructions below are subject to change.

Directory Structure

We highly recommend running AlphaFold3 from your $SCRATCH directory A typical working directory may look like this:

alphafold3_project/
├── input/
│   └── example_input.json
├── output/
└── slurm_jobs/

Input Preparation

AlphaFold3 expects a single .json input file describing the molecular system to be predicted. The input format is detailed in the official DeepMind documentation. This input file should be uploaded to your $WORK or $SCRATCH (recommended) space. Sample .json input files are provided in the machine-specific "Examples" path listed in Table 1. above.

A valid protein chain may look like:

{
  "name": "UQCR11_Hsapiens",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MVTRFLGPRYRELVKNWVPTAYTWGAVGAVGLVWATDWRLILDWVPYINGKFKKDN"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

SLURM Job Script Preparation

Next, prepare a batch job submission script for running AlphaFold3. Model inference must be run on a GPU. See the AlphaFold3 performance documentation for more information on executing AlphaFold3 jobs in stages to optimize resource utilization.

Templates for batch job submission scripts are provided within the "Examples" paths listed in Table 1. above. The example templates need to be customized before they can be used. Copy the desired tample to your $WORK or $SCRATCH space along with the input .json file. After necessary customizations, a batch script for running AlphaFold3 on Lonestar6 may look like this:

#!/bin/bash
#----------------------------------------------------------------------
#SBATCH -J AF3_protein             # Job name
#SBATCH -o AF3_protein.o%j         # Name of stdout output file
#SBATCH -e AF3_protein.e%j         # Name of stderr error file
#SBATCH -p gpu-a100                # Queue (partition) name
#SBATCH -N 1                       # Total # of nodes
#SBATCH -t 01:00:00                # Run time (hh:mm:ss)
#SBATCH -A my-project              # Allocation name
#----------------------------------------------------------------------

# Load required modules
module unload xalt
module use /scratch/tacc/apps/bio/alphafold3/modulefiles
module load alphafold3/3.0.1-ctr.lua

# Set environment variable definitions to point to your input, output, and model parameters directories:
export AF3_INPUT_DIR=$SCRATCH/input/
export AF3_OUTPUT_DIR=$SCRATCH/output/
export AF3_MODEL_PARAMETERS_DIR=$WORK/af3_weights

# Run AlphaFold3 container 
apptainer exec \
     --nv \
     --bind $AF3_INPUT_DIR:/root/af_input \                             
     --bind $AF3_OUTPUT_DIR:/root/af_output \                           
     --bind $AF3_MODEL_PARAMETERS_DIR:/root/models \                   
     --bind $AF3_DATABASES_DIR:/root/public_databases \     
     $AF3_IMAGE \                                           
     python ${AF3_CODE_DIR}/run_alphafold.py \              
     --json_path=/root/af_input/input.json \                   # MODIFY name of input.json
     --model_dir=/root/models \                                         
     --db_dir=/root/public_databases \                                  
     --output_dir=/root/af_output                                       

In the batch script, make sure to specify the partition (queue) (#SBATCH -p), node / wallclock limits, and allocation name (#SBATCH -A) appropriate to the machine you are running on. Also, make sure the path shown in the module use line matches the machine-specific "Module" path listed in Table 1. above.

When preparing a batch job script to run AlphaFold3, users must set several environment variables to point to their input, output, and model directories. The table below describes each variable and what users need to change.

Table 2. Required Variables to Set in Job Script

Variable What it does User Action Required?
AF3_INPUT_DIR Directory containing the input.json file Location of your input files (e.g., $SCRATCH/input)
AF3_OUTPUT_DIR Directory where output will be written Desired output path (e.g., $SCRATCH/output)
AF3_MODEL_PARAMETERS_DIR Directory where you manually downloaded and extracted the AlphaFold3 model weights Set this to where you stored the models (e.g., $WORK/af3_weights)
AF3_DATABASES_DIR Location of shared AlphaFold3 database files on TACC systems Do not modify
AF3_IMAGE Path to the AlphaFold3 container image Do not modify
AF3_CODE_DIR Location of AlphaFold3 source code inside the container Do not modify

Once the input .json and customized batch script are prepared, submit to the job queue with:

login1$ sbatch <job_script>

e.g.:

login1$ sbatch AF3_protein.slurm

We are currently benchmarking AlphaFold3 on TACC systems. Refer to the AlphaFold3 performance documentation for runtime estimates based on token size and available hardware. Refer to the AlphaFold3 output Documentation for a description of the expected output files.

References