Nvidia’s ai superchip

9 min read

THE NEWS

The company is no longer about gaming

The GB200 Superchip, destined to power the world’s most powerful data centers.
© NVIDIA

ON THE FIRST DAY of its annual GPU technology conference, Nvidia CEO Jensen Huang showed off the company’s next big thing: Blackwell.

Nvidia is keen to point out that Blackwell is a platform, not a GPU, and will power the next generation of AI accelerators and 5000-series graphics cards. Basically, it is the basis for everything from modest graphics cards to monster data-center kit, designed to run the world-changing AI that seems destined to dominate things. It is named in honor of David Harold Blackwell, the mathematician who specialized in game theory, probability, and statistics.

Nvidia claims that Blackwell has “six revolutionary technologies” designed for AI training and real-time LLMs. At its heart, the Blackwell silicon is two of the largest possible physical chips unified into one 4nm monster GPU with 208 billion transistors. It has support for a secondgeneration transformer engine and a fifth-generation NVLink, enabling up to 1.8TB/s throughput per GPU. There’s a dedicated RAS engine, which stands for reliability, availability, and serviceability. A secure AI system is designed to protect models and data. Finally, there’s a dedicated decompression engine to accelerate database queries and data analytics. Nvidia threw around some numbers for how much improvement Blackwell will offer, starting at 2.5 times faster than Hopper, going up to six or seven. When the metrics switch to LLM AI, the numbers go through the roof to 25 times faster. A lot of effort has been put into shifting data about, too. The platform includes a new NVLink chip, which has 50 billion transistors. The plan was to get every GPU in a Blackwell system to ‘talk’ to every other GPU at full speed, making one giant GPU.

The initial Blackwell chips have arrived in the form of the GB200 Grace Blackwell Superchip. This is designed for multi-node, liquid-cooled rack mounting in systems destined to run the most intensive workloads, meaning a lot of AI. It has two Blackwell B200 Tensor cores and a Grace CPU, all connected by a 900GB/s interconnect. This 40 petaFLOP monster is billed by Nvidia as the world’s most powerful chip. Put 72 of these together, and you have the first exaflop supercomputer that fits on a single rack. In context, the first machine that could manage an exaflop was switched on in May 2022, and has 74 racks. Admittedly, Blackwell can only manage the feat by running inference FP4 instructions rather than full FP64, but it is still a staggering show of power. Less frightening is the B100, a replacement for the H100, A100, and B200. These consist of a single Blackwell GPU.

Interesting early projects for Blackwell include Gr00t, whi