Clockwork Helps GPUs Keep Busy

Startup Clockwork.io recently launched a flurry of news highlighting its position in the market for AI cluster efficiency, a growing concern for enterprises, hyperscalers, and neoclouds. And in the process, the vendor laid claim to some capabilities that make it stand out.
Topping the news was Clockwork’s launch of FleetIQ, a software-based solution designed to keep GPU clusters working at top efficiency. Based on complex telemetry developed at Stanford, FleetIQ acts as “a Waze for GPU clusters,” according to Dan Zheng, Clockwork’s VP of Products and Solutions.
Zheng notes that link failures and other disruptions in today’s AI networks are significant, leading to GPUs from NVIDIA or AMD only being utilized from 30% to 55% of their theoretical performance capabilities. The result is enormous waste: Clockwork estimates that a typical 100,000-GPU hypescaler/neocloud training cluster priced between $5 billion to $7 billion could lose over $2.25 billion in unused capacity given these failure rates.
And that’s the problem Clockwork says it solves with a combination of highly granular observability combined with failover capabilities and adaptive routing to minimize delays and maximize throughput in GPU clusters.
Synchronized Timing and Traffic Control
At the heart of Clockwork’s technology are what the vendor calls Global ClockSync and Dynamic Traffic Control. The first refers to a technique that establishes a common sub-microsecond-accurate “timeline” for hosts, NICs, and switches. Packet probes identify traffic patterns to ensure everything is working according to the timing set by Global ClockSync.

Clockwork.io's Global ClockSync depicted. Source: Clockwork.io
Dynamic Traffic Control (DTC) refers to Clockwork’s capability to optimize traffic and reroute it to avoid congestion and network disruptions such as link flapping. The software works across a diverse array of hardware configurations, including Ethernet networks, RoCE links, and InfiniBand connections. It also works with any form of XPU, including components from NVIDIA and AMD, the vendor says.
Together, Global ClockSync and DTC comprise what Clockwork calls FleetIQ’s Software-Driven Fabric (SDF), which the vendor describes this way:
“SDF continuously monitors network conditions with nanosecond precision, enabling immediate detection of link flaps, NIC failures, or other disruptions. By automatically rerouting traffic around affected areas without human intervention, SDF prevents the costly interruptions typical in large-scale AI environments. As a result, cluster utilization and operational efficiency increase substantially.”
Clockwork’s technology faces a range of competition, including from observability vendors such as Datadog and Dynatrace, as well as from autonomous networking suppliers such as Juniper Networks (now part of HPE). The vendor claims to differentiate based on its combination of granular observability and rerouting, along with its software-only architecture, which supports traffic across a broad range of network configurations. Scale is also a plus: “We work with anyone with over 500 GPUs,” said Clockwork’s Zheng.
Notably, the capability to observe events in less than a microsecond helps Clockwork to catch network problems more quickly than some competing network observability tools.
Customer Endorsements
Clockwork.io’s datacenter networking capabilities have drawn a roster of customers that include Uber and Wells Fargo, along with neoclouds Nebius, NScale, and WhiteFiber, as well as the Danish Centre for AI Innovation (DCAI), which operates Denmark’s Gefion supercomputer.
Here’s how Albert Greenberg, Chief Architect Officer at Uber, described the company’s use of Clockwork in the FleetIQ press release:
“[Clockwork’s] unique innovation can greatly help Uber expedite the detection and fault-localization of networking issues: from hours to minutes, which will greatly improve service tail latency and prevent noisy neighbor impact.
“We are in the process of rolling out Clockwork across Uber infrastructure, and look forward to experiencing their full capabilities at Uber's scale.”
Funding and New Leadership
Clockwork has other news as well. In September, the eight-year-old vendor closed a round of $20.6 million, increasing its funding to about $41.6 million. The round was led by NEA with participation from Intel CEO Lip-Bu Tan, former Cisco CEO John Chambers, venture investor Carl Ledbetter, and e& Capital.
Clockwork also has a new CEO, Suresh Vasudevan (ex-Sysdig, Nimble Storage, and many others), and a new VP of Worldwide Sales, Joe Tarantino (ex-GMI Cloud and Cohesity).
Futuriom Take: Clockwork.io’s networking clock-synchronization, dynamic rerouting, and software-only architecture have boosted its profile among the growing roster of solutions for AI cluster efficiency. Recent high-profile investors and customers indicate that for its unusual technology, the moment has arrived.