Technical Specifications

The NVIDIA H200 sets a new standard for AI computing. Below are the detailed specifications and a comparison with the previous generation H100.

Feature	NVIDIA H200	NVIDIA H100
Memory Capacity	141 GB HBM3e	80 GB HBM3
Memory Bandwidth	4.8 TB/s	3.35 TB/s
Architecture	NVIDIA Hopper™	NVIDIA Hopper™
Llama2 70B Inference	1.9x Faster	Baseline
GPT-3 175B Inference	1.6x Faster	Baseline

Key Features

HBM3e Memory

The H200 is the first GPU to feature HBM3e memory, providing 141GB of capacity. This allows for larger models to fit into memory, reducing the need for model parallelism and communication overhead.

4.8 TB/s Bandwidth

With 4.8 TB/s of memory bandwidth, the H200 can feed data to its computational cores at unprecedented speeds, significantly accelerating memory-bound workloads like LLM inference.

Hopper Architecture

Built on the NVIDIA Hopper architecture, the H200 features the Transformer Engine, which intelligently manages precision (FP8, FP16, BF16) to optimize performance and efficiency for AI models.

NVLink Switch System

The H200 supports the fourth-generation NVLink, enabling high-speed communication between GPUs. This is critical for scaling up training and inference across thousands of GPUs.