NVIDIA Ethernet Drives xAI's Supercomputer
In news that flew a bit under the radar, NVIDIA announced last month that xAI’s gigantic new supercomputer in Memphis, Tennessee—called Colossus—is using NVIDIA Spectrum-X Ethernet networking to connect 100,000 NVIDIA Hopper GPUs.
xAI, of course, is the company headed by Elon Musk (who is also CEO of Tesla and SpaceX). In its mission to “understand the universe,” xAI is training a series of large language models (LLMs), including Grok-0, Grok-1, and Grok-2.
The news of NVIDIA's role in Colossus is significant on several fronts. It underscores NVIDIA’s dominance in the supercomputing market. It also illustrates the strength of NVIDIA’s brand of Ethernet networking, which retains the interoperability of Ethernet while supporting the demands of AI.
Demands of the Largest Supercomputer
In the case of Colossus, claimed by xAI and NVIDIA to be the world’s largest supercomputer, AI’s demands are enormous. To build Colossus, xAI recruited some of the world’s top engineers. The cluster comprises 100,000 H100 GPUs, configured in HGX servers containing eight GPUs each. (That’s for now: The cluster will soon be expanded by another 100,000 GPUs.) The servers are housed in Supermicro liquid-cooled racks, with 64 GPUs per rack. There are a total of 1,500 GPU racks in the Colossus cluster.
To access the rest of this article, you need a Futuriom CLOUD TRACKER PRO subscription — see below.
Access CLOUD TRACKER PRO
|
CLOUD TRACKER PRO Subscribers — Sign In |