Horizon User Guide
Last update: April 16, 2026
This user guide is in progress
Notices
- Subscribe to Horizon User News. Stay up-to-date on Horizon's status, scheduled maintenances and other notifications. (04/15/2026)
Introduction
Horizon is a National Science Foundation-funded system that is part of the the Leadership Class Computing Facility (LCCF) award, Award Abstract #2323116. Please reference TACC when providing any citations.
Allocations
- NAIRR: NAIRR https://nairrpilot.org/
- TxRAS: Submit to opportunities, including LRAC at https://submit-tacc.xras.org/
System Specifications
Developed in collaboration with Dell Technologies, NVIDIA, VAST Data, Spectra Logic, Versity, and Sabey Data Centers, the Horizon supercomputer combines cutting-edge technologies with advanced infrastructure to redefine what is possible in scientific computing.
| Specification | Value |
|---|---|
| Performance GPU component only) |
160 petaflops (FP64 units) 320 petaflops (FP32 units) ~288 petaflops |
| AI Performance | 20 exaflops for AI at BF16/FP16 40 exaflops for AI at FP8 80 exaflops for AI at FP4 |
| Scale | NVIDIA Grace Blackwell platform featuring 4,000 GPUs and NVIDIA Vera Vera servers featuring 4752 nodes |
| Networking | Interconnected by the NVIDIA Quantum-x800 InfiniBand networking platform |
| Local All-Solid State Storage | 400PB of storage delivering more than 8TB/s of read/write bandwidth along with multi-tenancy and Quality-of-Service capabilities. |
| Efficiency | Up to 6x more energy efficient, powered by a new 20 MW data center with advanced liquid cooling in Round Rock, Texas. |
Grace Blackwell Compute Nodes
Horizon hosts 2,000 Grace Blackwell (GB) nodes. The Grace Blackwell (GB) subsystem consists of nodes using the GB200 NVL4 platform. The NVL4 platform is configured as 2 nodes each with 2 NVIDIA Blackwell GPUs each with 185 GiB of HBM3 memory and 1 Grace CPU with 120 GiB of LPDDR5X memory and 72 cores.
A GB node provides 80 TFlops of FP64 performance (~160 TFlops DGEMM performance using NVML) and 20 PFlops of FP16 performance for ML workflows. The GB subsystem is housed in 28 racks, each containing 72 GB nodes. These nodes connect via an NVIDIA InfiniBand 800 Gb/s fabric to NVIDIA XDR InfiniBand switches using a fully connected two-level fat-tree topology.
Table 1. Grace Blackwell Specifications
| Specification | Value |
|---|---|
| GPU: | Blackwell, GB200 |
| GPU Memory: | 185 Mib |
| CPU: | NVIDIA Grace CPU |
| Total CPU cores per node: | 72 cores on one socket |
| Hardware threads per core: | 1 |
| Hardware threads per node: | 72 |
| Clock rate: | 3.4 GHz |
| Memory: | 240 GiB LPDDR |
| Cache: | 64 KB L1 data cache per core; 1MB L2 per core; 114 MB L3 |
| Local storage: | 130 GiB |
| DRAM: | LPDDR5 |
Vera Vera Compute Nodes
Table 2. Vera Vera Specifications
| Specification | Value |
|---|---|
| CPU: | NVIDIA Vera CPU |
| Total cores per node: | 176 cores on two sockets (2 x 88) |
| Hardware threads per core: | 2 (Spatial Multi-Threading) |
| Hardware threads per node: | 2 x 88 x 2 = 352 |
| Clock rate: | TBA |
| Memory: | 500 GB LPDDR |
| Cache: | TBA |
| Local storage: | 240 GB |
| DRAM: | LPDDR5 |
Horizon hosts 4752 "Vera Vera" (VV) nodes with 176 cores each. Each VV node provides a performance increase of over 3x compared to Frontera's CLX nodes and ~2x compared to the Grace Grace nodes of Vista. This improved performance per node is due to increase in core count, 176 vs 144, an increase in vector units per core, 6 vs 4, and an increase in memory bandwidth, 2.4 TB/s vs 1 TB/s. Each VV node provides over 13 TFlops of double precision performance and over 2 TiB/s of memory bandwidth.
Login Nodes
The Horizon login nodes are Grace Grace (GG) nodes with 144 cores and 237 GB of LPDDR. They are compatible with the NVIDA and GNU software stacks installed for the GB and VV nodes.
Network
The interconnect is based on Mellanox XDR technology with full XDR (800 Gb/s) connectivity between the switches and the GB GPU nodes and with XDR400 (400 Gb/s) connectivity to the VV compute nodes. A fat tree topology fully connects the compute nodes and the GPU nodes within separate trees with no over subscription within each tree. Every GB node is fully connected using full XDR (800 Gb/s) to every other GB node. Every VV node is fully connected using XDR400 (400 Gb/s) to every other VV node. Both sets of nodes are connected to the $HOME and $SCRATCH file systems via Infiniband.
File Systems
Horizon will use a shared VAST file system for the $HOME and $SCRATCH directories.
!!! warning
The $WORK filesystem will not be available for early users. A $WORK filesystem will be made available later in 2026. The /tmp partition is also available to users but is local to each node.
Table 3. File Systems
in progress
| File System | Type | Quota | Key Features |
|---|---|---|---|
$HOME |
VAST | TBD | Not intended for parallel or high−intensity file operations. Backed up daily. |
$WORK |
VAST | TBD | Not backed up. |
$SCRATCH |
VAST | no quota Overall capacity ~400 PB. | Not backed up. Files are subject to purge if access time* is more than 10 days old. See TACC's Scratch File System Purge Policy below. |
Scratch File System Purge Policy
Warning
The $SCRATCH file system, as its name indicates, is a temporary storage space. Files that have not been accessed* in ten days are subject to purge.
Deliberately modifying file access time, using any method, tool, or program, for the purpose of circumventing purge policies is prohibited.
*The operating system updates a file's access time when that file is modified on a login or compute node. Reading or executing a file/script on a login node does not update the access time, but reading or executing on a compute node does update the access time. This approach helps us distinguish between routine management tasks (e.g. tar, scp) and production use. Use the command ls -ul to view access times.
Running Jobs
Like all other current TACC systems, Horizon employs the Slurm Workload Manager as it's job scheduler. Slurm commands enable you to submit, manage, monitor, and control your jobs.
Slurm Partitions (Queues)
Warning
Queue limits are subject to change without notice.
Horizon admins may occasionally adjust queue settings in order to ensure fair scheduling for the entire user community.
TACC's qlimits utility will display the latest queue configurations.
Table 4. Production Queues
| Queue Name | Node Type | Max Nodes per Job | Max Job Duration | Charge Rate (per node-hour) |
|---|---|---|---|---|
gb |
Grace Blackwell | 128 | 48 hrs | 1 SU |
gb-dev |
Grace Blackwell | 16 | 2 hrs | 1 SU |
gb-large |
Grace Blackwell | 512 | 48 hrs | 1 SU |
vv |
Vera Vera | 256 | 48 hrs | 0.25 SUs |
vv-dev |
Vera Vera | 32 | 2 hrs | 0.25 SUs |
vv-large |
Vera Vera | 1024 | 48 hrs | 0.25 SUs |
Reminder: A Grace Blackwell node contains 1 Grace CPU and 2 Blackwell GPUs.