NVIDIA announced several innovations in the field AI hardware during the GTC 2022 conference: among these we list the next generation Hopper architecture, the first video cards for datacenter based on it and a “super CPU” also for datacenter. He also stated his intention to build the fastest AI supercomputer in the world, although details are scarce at this juncture. We summarize all the most important information.
HOPPER AND THE DGX H100 GPUs
Architecture, whose name is a reference to the pioneer of information technology Grace Hopper, has been rumored for some time. NVIDIA says it has specifically focused on the performance improvement of one of the most popular and widely used machine learning systems, known as Transformerused for example by Google / Alphabet’s DeepMind for its AlphaFold algorithm: compared to the previous generation of chips, the architecture speeds up these calculations up to six times on H100 GPUs.
In fact, the increase can be even greater thanks to a number of further innovations related to the chip and the GPU. The H100s are good cards 80 billion transistors and are the first to support fifth generation PCIe technology. They also use HBM3 memories, thus bringing the overall bandwidth to well 3 TB / s. Finally, the fourth generation NVlink technology allows you to connect up to 256 cards in series with a bandwidth 9 times wider than that of the previous generation.
Overall, therefore, on the H100 the performance increase in Transformer is multiplied up to 9 times: Models that took weeks to train may now take just a few days, the company says. The 16, 32 and 64-bit floating-point operations also triple, while the 8-bit operations triple.
The Grace superchip, Hopper’s ideal complement, actually consists of two CPUs connected directly via a very low latency technology called NVlink-C2C. It is easy to draw parallels with the UltraFusion technology announced by Apple with its new M1 Ultra chip, and indeed the render seems to show a very similar layout, but for now the technical details are missing to make more in-depth comparisons.
The fact is that the chip contains well 144 core Arm v9, but it is not known exactly what their characteristics are. We know instead that RAM is LPDDR5x with ECC, an absolute novelty, and that the total memory bandwidth is 1 TB / s. NVIDIA explains that the superchip can be used in both CPU-only and GPU-accelerated systems. It is also worth noting that NVlink-C2C technology can be used to combine a CPU with a GPU, thereby creating a “Grace Hopper” superchip.
In this phase it is little more than a side note of the launch of the DGX H100, but the few technical data released are already quite impressive: the SuperPOD will consist of 576 DGX H100-based systems, each of which contains 8 GPU: a simple multiplication is enough to conclude that Eos will include the beauty of 4,608 latest generation AI GPUs.
All in all the power of Eos is expected to reach 18.4 esaFLOPS in AI operations – four times that of the Japanese supercomputer Fugaku, currently the industry benchmark. It is therefore very likely that when it goes into operation before the end of this year, it will become the most powerful AI computer in the world. Eos will not fare badly even in traditional scientific operations: the expected power is 275 petaFLOPS: the current leader, always the Fugaku, goes up to 537, in second position is the IBM Summit with 200.