Hyperscale Hardware: Components for the AI Cloud

sagawa

Hyperscale Hardware: Components for the AI Cloud

As the TMT world moves resolutely into the AI/cloud era, investment in hyperscale data centers is surging, while spending on private enterprise data centers has begun an inevitable decline. Hyperscale architecture differs from the established enterprise approach in meaningful ways, with considerable implications for technology suppliers. The big cloud operators (e.g. GOOGL, AMZN, MSFT, and FB) typically buy standard parts in bulk for their self-designed modular rack systems, while enterprises rely on configured systems sold by OEMs, which often employ more feature rich or custom components. With the demands of massive cloud applications and cutting edge AI, hyperscale players also have some unique technology needs – e.g. very high speed optical interfaces, open switch fabrics, and AI-friendly graphics processors. Suppliers specializing in these areas (i.e. MLNX, ANET, IPHI, LITE, CIEN, MSCC, NVDA, etc.) without significant exposure to the traditional data center market could prosper. Other suppliers, generally systems vendors and the makers of costly ASICS designed specifically for their products, will experience increasing suffering. Component vendors that straddle the two markets – e.g. INTC, STX and WDC – are generally feeling the pain of declining spend by their enterprise OEM and device customers more acutely than the benefits of rising hyperscale investment. We believe that an inflection point may be more than a few quarters away.

Huge growth in hyperscale data center spending. Led by the big four US cloud platforms (AMZN, GOOGL, MSFT and FB), spending on very large capacity, distributed (Hyperscale) data center infrastructure is growing at a 40.2% annual pace. These investments support the huge growth consumer apps, the exodus of enterprise IT into the public cloud, and the rise of deep learning AI. At the same time, we anticipate spending on private enterprise data center capacity will decline at a -7.7% CAGR, with weak demand for PCs, premium smartphones and other devices a further onus on many component suppliers. The net impact of this generational paradigm shift is likely to be highly deflationary on overall hardware spending.

Hyperscale commoditizes hardware. Cloud operators eschew the high-margin integrated systems historically favored by enterprise IT, choosing instead to buy mostly standard, off-the-shelf components to be installed by contract manufacturers onto bare-bones modules of their own design. This has already been damaging for the makers of servers, storage systems, and networking gear, who now must compete with inexpensive hyperscale inspired “white box” alternatives in the private data center market as well. We estimate that 36% of all server capacity sold is now hyperscale or white box. Networking switches are a few years farther back, but on the same path, with white box up to 14% of industry sales.

High performance networking opportunities. The sheer size of networked hyperscale data centers, along with the AI ambitions of the companies that operate them create new opportunities for capabilities unnecessary for the enterprise market. 10Gbps connections in private data centers are 40-100Gps in hyperscale facilities, with strong demand for high performance server interfaces, switches, and optical data center interconnect solutions. We see component vendors MLNX, IPHI, LITE, AMCC and MSCC as well positioned against these opportunities.  Generally, these operators eschew configured systems, preferring bare-bones “white box” solutions built from open standard parts to run their own proprietary networking software, although very high performance switches and optical transmission gear from vendors like ANET, CIEN, NOK, INFN and others are also in demand.

Deep learning needs GPUs. The architecture of a CPU is designed to queue up multiple different computing tasks to be performed in sequence, following software instructions at each step. In contrast, a GPU is designed to complete the same, single computing task repeatedly, as fast as possible. The highly iterative nature of machine learning is much better suited to GPUs than CPUs, prompting the biggest hyperscale operators to accelerate spending on them. GOOGL has designed its own ASIC solution, the Tensor Processing Unit (TPU), but AMZN, MSFT and FB have all opted for an AI optimized GPU from NVDA. MSFT is also deploying FPGAs from INTC’s ALTR for machine vision AI, favoring them over GPUs in this application for their superior power efficiency.

Power efficiency. Electricity costs can be 25% or more of the total operating budget of a hyperscale data center. As such, cloud operators place substantial emphasis on power efficiency – note MSFT’s interest in FPGA’s for deep learning despite their performance disadvantage. The major cloud operators are also monitoring the progress of ARM-based CPUs from AMCC, QCOM and others, which could show significant power efficiency gains vs. x86 processors – we believe any meaningful shift in total spending is still a few years away. This efficiency focus is also apparent in the design of the power systems themselves – GOOGL is leading a move to all DC power components to avoid wasteful AC/DC conversions, a serious negative for makers of analog power components and systems.

The good with the bad. Most data center component players sell to both hyperscale operators and system OEMs, some with added exposure to the deteriorating markets for PCs, tablets and premium smartphones. Moreover, the parts used in hyperscale infrastructure are typically simpler, cheaper and lower margin, than those sold into enterprise systems. For CPU maker INTC and disk drive vendors STX and WDC, growth in demand from the cloud may not be sufficient to offset the erosion of their traditional markets. While we had been optimistic that disk drives and server CPUs might find their bottom, we now believe that the inflection point is likely still several quarters ahead.

Tough times for most systems vendors. While some cloud operators – notably AAPL and a few others – still rely on traditional systems companies to design and deploy their data centers, the momentum is overwhelmingly toward bare bones “white box” solutions, often manufactured directly to the customer’s own specifications. This is generally bad news for companies like HPE, DELL, EMC, CSCO, JNPR, IBM and others. However, the need for very high performance, particularly for networking between facilities, leaves room for vendors like ANET, CIEN, NOK, INFA, and others to compete successfully.

For our full research notes, please visit our published research site.

Print Friendly, PDF & Email