Horizon User Guide

Last update: April 20, 2026

This user guide is in progress

Notices

See the latest TACC Announcement regarding Horizon opportunities. (04/17/2026)
Subscribe to Horizon User News. Stay up-to-date on Horizon's status, scheduled maintenances and other notifications. (04/15/2026)

Introduction

Horizon is a National Science Foundation-funded system that is part of the the Leadership Class Computing Facility (LCCF) award, Award Abstract #2323116. Please reference TACC when providing any citations.

Allocations

NAIRR: NAIRR https://nairrpilot.org/
TxRAS: Submit to opportunities, including LRAC at https://submit-tacc.xras.org/
See the latest TACC Announcement regarding Horizon opportunities.

System Specifications

Developed in collaboration with Dell Technologies, NVIDIA, VAST Data, Spectra Logic, Versity, and Sabey Data Centers, the Horizon supercomputer combines cutting-edge technologies with advanced infrastructure to redefine what is possible in scientific computing.

Specification	Value
Performance GPU component only)	160 petaflops (FP64 units) 320 petaflops (FP32 units) ~288 petaflops (DGEMM using Ozaki method)
AI Performance	20 exaflops for AI at BF16/FP16 40 exaflops for AI at FP8 80 exaflops for AI at FP4
Scale	NVIDIA Grace Blackwell platform featuring 4,000 GPUs and NVIDIA Vera Vera servers featuring 4752 nodes
Networking	Interconnected by the NVIDIA Quantum-x800 InfiniBand networking platform
Local All-Solid State Storage	400PB of storage delivering more than 8TB/s of read/write bandwidth along with multi-tenancy and Quality-of-Service capabilities.
Efficiency	Up to 6x more energy efficient, powered by a new 20 MW data center with advanced liquid cooling in Round Rock, Texas.

Grace Blackwell Compute Nodes

Horizon hosts 2,000 Grace Blackwell (GB) nodes. The Grace Blackwell (GB) subsystem consists of nodes using the GB200 NVL4 platform. The NVL4 platform is configured as 2 nodes each with 2 NVIDIA Blackwell GPUs each with 185 GiB of HBM3 memory and 1 Grace CPU with 120 GiB of LPDDR5X memory and 72 cores.

A GB node provides 80 TFlops of FP64 performance (~160 TFlops DGEMM performance using NVML) and 20 PFlops of FP16 performance for ML workflows. The GB subsystem is housed in 28 racks, each containing 72 GB nodes. These nodes connect via an NVIDIA InfiniBand 800 Gb/s fabric to NVIDIA XDR InfiniBand switches using a fully connected two-level fat-tree topology.

Table 1. Grace Blackwell Specifications

Specification	Value
GPU:	Blackwell, GB200
GPU Memory:	185 Mib
CPU:	NVIDIA Grace CPU
Total CPU cores per node:	72 cores on one socket
Hardware threads per core:	1
Hardware threads per node:	72
Clock rate:	3.4 GHz
Memory:	240 GiB LPDDR
Cache:	64 KB L1 data cache per core; 1MB L2 per core; 114 MB L3
Local storage:	130 GiB
DRAM:	LPDDR5

Vera Vera Compute Nodes

Table 2. Vera Vera Specifications

Specification	Value
CPU:	NVIDIA Vera CPU
Total cores per node:	176 cores on two sockets (2 x 88)
Hardware threads per core:	2 (Spatial Multi-Threading)
Hardware threads per node:	2 x 88 x 2 = 352
Clock rate:	TBA
Memory:	500 GB LPDDR
Cache:	TBA
Local storage:	240 GB
DRAM:	LPDDR5

Horizon hosts 4752 "Vera Vera" (VV) nodes with 176 cores each. Each VV node provides a performance increase of over 3x compared to Frontera's CLX nodes and ~2x compared to the Grace Grace nodes of Vista. This improved performance per node is due to increase in core count, 176 vs 144, an increase in vector units per core, 6 vs 4, and an increase in memory bandwidth, 2.4 TB/s vs 1 TB/s. Each VV node provides over 13 TFlops of double precision performance and over 2 TiB/s of memory bandwidth.

The Horizon login nodes are Grace Grace (GG) nodes with 144 cores and 237 GB of LPDDR. They are compatible with the NVIDA and GNU software stacks installed for the GB and VV nodes.

Network

The interconnect is based on Mellanox XDR technology with full XDR (800 Gb/s) connectivity between the switches and the GB GPU nodes and with XDR400 (400 Gb/s) connectivity to the VV compute nodes. A fat tree topology fully connects the compute nodes and the GPU nodes within separate trees with no over subscription within each tree. Every GB node is fully connected using full XDR (800 Gb/s) to every other GB node. Every VV node is fully connected using XDR400 (400 Gb/s) to every other VV node. Both sets of nodes are connected to the $HOME and $SCRATCH file systems via Infiniband.

File Systems

Horizon will use a shared VAST file system for the $HOME and $SCRATCH directories.

!!! warning The $WORK filesystem will not be available for early users. A $WORK filesystem will be made available later in 2026. The /tmp partition is also available to users but is local to each node.

Table 3. File Systems

in progress

File System	Type	Quota	Key Features
`$HOME`	VAST	TBD	Not intended for parallel or high−intensity file operations. Backed up daily.
`$WORK`	VAST	TBD	Not backed up.
`$SCRATCH`	VAST	no quota Overall capacity ~400 PB.	Not backed up. Files are subject to purge if access time* is more than 10 days old. See TACC's Scratch File System Purge Policy below.

Scratch File System Purge Policy

Warning

The $SCRATCH file system, as its name indicates, is a temporary storage space. Files that have not been accessed* in ten days are subject to purge.

Deliberately modifying file access time, using any method, tool, or program, for the purpose of circumventing purge policies is prohibited.

*The operating system updates a file's access time when that file is modified on a login or compute node. Reading or executing a file/script on a login node does not update the access time, but reading or executing on a compute node does update the access time. This approach helps us distinguish between routine management tasks (e.g. tar, scp) and production use. Use the command ls -ul to view access times.

Running Jobs

Like all other current TACC systems, Horizon employs the Slurm Workload Manager as it's job scheduler. Slurm commands enable you to submit, manage, monitor, and control your jobs.

Slurm Partitions (Queues)