Hyperscale Semiconductors: Processor Diversity Coming to The Cloud

sagawa
Print Friendly, PDF & Email

SEE LAST PAGE OF THIS REPORT Paul Sagawa / Tejas Raut Dessai

FOR IMPORTANT DISCLOSURES 203.901.1633 /.553.9827

psagawa@ / trdessai@ssrllc.com

twitter.jpg @PaulSagawaSSR

November 16, 2017

Hyperscale Semiconductors: Processor Diversity Coming to The Cloud

The market for datacenter-class processor chips is shifting decisively toward cloud-based hyperscale operators, a trajectory that will continue for many years. To date, this shift has benefitted INTC and its x86 architecture, but two major change vectors threaten this hegemony: 1. Explosive AI growth –AI software runs faster and better on servers that deemphasize CPUs in favor of parallel processing architectures, (GPUs, FPGAs or ASICs). 2. New CPU alternatives – Hyperscalers have lower switching barriers and are evaluating alternative CPUs (ARM, Power9) for non-AI workloads. NVDA, XLNX, QCOM, and IBM are the likely beneficiaries of this shift, with NVDA, its GPUs and CUDA SW architecture favored for AI training, the biggest winner. INTC faces substantial risks for its now successful cloud server franchise.

  • Hyperscale platforms will dominate future datacenter processor demand. With substantial intrinsic cost and performance advantages over private datacenters, the largest hyperscale platform operators (i.e. GOOGL, FB, AMZN, MSFT, IBM, BIDU, BABA and Tencent) will capture a rapidly increasing share of enterprise computing workloads, likely surpassing 50% in developed markets within a decade (http://www.ssrllc.com/publication/5g-hyperscale-ai-5g-sea-change/). These companies have already surpassed PC OEMs as the largest individual buyers of server processor chips, and will be the primary source of growth in demand going forward. We expect processor demand from the hyperscale market to grow at a ~25% annual pace through 2022.
  • Hyperscale workloads are still overwhelmingly run on INTC CPUs. To date, the top cloud platforms have relied on high-end x86 server chips from INTC for internal and commercial datacenters – estimates of its CPU share in the cloud market range as high as 98%. This strength drove INTC’s 3Q17 surprise, the 24% rise in the segment offsetting static demand from PCs and declines from enterprise datacenters. However, changes at hyperscale operators will work against the x86 hegemony.
  • Changing workloads favor alternative processor architectures. AI workloads have very different requirements, and will drive a shift to new server configurations. Training learning based models requires running many simple algorithms in parallel and iterate through data at very high speeds, as developers tweak the models. Clusters of specialized co-processors working together are dramatically more efficient at these workloads, and can be coordinated by a relatively simple (and cheap) CPU. Running the resulting “inference” models for AI in active applications, is a somewhat different task, evaluating live data inputs in the context of previous learning. CPUs can handle very simple inference, but sophisticated models require the speed provided by co-processing. We estimate that AI co-processors for cloud platforms, about 16% of cloud processor purchases today, will grow to more than 42% of the market by 2022, posting a 50%+ CAGR over that time.
  • For AI, GPUs will be standard, FPGAs common and ASICs focused. There are 3 different approaches for AI co-processing. Today, most AI runs on boards with 8-16 GPUs controlled by a single low-cost CPU. NVDA dominates this model, having pioneered AI-focused GPU designs and establishing its CUDA software platform as a standard for AI developers, and we believe its advantages are sustainable. GPUs are the overwhelmingly preferred choice for training (about 40% of the future AI market) and are a solid choice for inference (60%). The primary alternative for inference models are FPGAs, which offer faster execution and better power use than GPUs but less flexibility and higher prices. We project FPGA’s to capture about a third of the inference market. Finally, ASICS will be appealing to companies with very large, homogenous, internal AI inference workloads – i.e. GOOGL, and maybe, FB. The very high upfront development costs and scale requirements make it unlikely that platforms with more diverse needs would follow the ASIC path.
  • Alternative CPUs will challenge x86. AI co-processors are not the only threat to x86 in the cloud. The 8 companies that will make up an ever-larger proportion of datacenter demand regularly reevaluate their configurations and have the wherewithal to port certain types of workloads to new architectures if advantageous. MSFT and BABA participated in last week’s launch presentation for QCOM’s ARM-based Centriq datacenter CPUs, and GOOGL and FB are known to be evaluating it. Centriq, which is more densely packed with transistors than the top of the line INTC CPU, aims to sharply cut power draw with better performance for typical cloud processing workloads. CAVM and AMCC also offer ARM-based datacenter chips. IBM’s new Power9 chips also stack up well on performance, while offering full on-chip encryption for superior security. It has already been named to supply Power9 for supercomputers being developed by the US government and is being tested at GOOGL and IBM’s own cloud. It is possible that ARM and/or Power9 could grow to challenge the x86 hegemony.
  • NVDA, QCOM, XLNX, and IBM lead potential beneficiaries. We believe NVDA’s GPU sales to hyperscale datacenter could post a 41% CAGR over the next 5 years, taking almost half of the AI co-processor opportunity. XLNX could build a ~$1.5B business in FPGAs used for AI inference in the same time, a 10-fold increase from today. We see QCOM’s ARM-based Centriq chips as the strongest alternative to INTC for cloud platforms, and project a $1-2B sales potential by 2022 – a 10% boost to its current chip revenues. IBM’s Power9 CPU is intriguing, particularly as an adjunct CPU for GPU co-processing and for end-to-end encryption. We are less enthused with AMDs prospects – it is focused on an x86 architecture that cloud platforms are deemphasizing, it is badly outclassed by NVDA in GPUs, and its historical execution has been poor. CAVM and AMCC are true wildcards with punchers chances.
  • INTC’s datacenter dominance will be eroded. With the ascent of AI, CPUs will drop from about 84% of cloud processer purchases to less than 60% over the next 5 years. In the same period, we expect INTC’s share of CPUs to fall from well above 95% to ~75%, as alternative processors from QCOM, IBM and others make inroads. Even with this competitive incursion, very strong hyperscale datacenter investment should still yield low teens sales growth for INTC in the segment, likely a bright spot for the company, facing erosion in private datacenters and in PCs. However, increased competition should also mean substantial margin pressure on INTC’s most profitable products.

CPUs, GPUs, TPUs and FPGAs

INTC won the PC Era, dominating the desktop, the laptop and the datacenter server markets. The emerging Cloud/AI Era has seen INTC whiff on the monster mobile device market because of its lack of vision and inept execution. However, INTC and its x86 CPU architecture still rules in the datacenter, and especially in hyperscale cloud datacenters, where its most expensive Skylake XEON processors rule the roost. The pace of demand from the hyperscalers and the ferocity of the competition amongst them has INTC enjoying increasing $/MIP for its server chips for the first time in a long time. Datacenter CPU sales were up 12% in 3Q17, led by a 24% jump in cloud operator spending.

Alas, threats loom. The top hyperscale operators – i.e. GOOGL, FB, AMZN, MSFT, IBM, BIDU, BABA and Tencent – divide their workloads into about a dozen types and design separate configurations for each. Older designs are replaced every three years or so, and entirely new configurations are occasionally added to address emerging applications, such as AI. CPUs can be changed out or deemphasized in favor of co-processing, and the big platforms have the wherewithal to port their important internal applications to new hardware if needed. With about 30% of server chip demand coming from this fast growing but highly concentrated group, the hegemony is at risk.

AI is the biggest change – more than 40% of datacenter workloads could be AI within 5 years. The two phases of AI – training and inference (aka the working application) – both require processors to repeat simple tasks many times at very high speed. CPUs, which are designed to perform complex computations a single time and move on to the next, are a mismatch for AI processing. A few years back, AI developers began to use graphics processors (GPUs), first for training and then for inference as well. Clusters of GPUs run the repetitive computations in parallel, supported by a relatively simple (and cheap) CPU. NVDA, the first to tailor GPUs to the specific needs of AI, has been rewarded for its vision – its chips and its underlying software layer CUDA are standard for most AI datacenter solutions.

Inference models, fully trained and used in working applications, may benefit from an even more tailored solution. Companies with very large applications with very specific needs may design their own custom ASIC. We believe a very small number of platforms have franchises big enough to justify the huge upfront and scale-related costs of ASIC development – perhaps just GOOGL (search) and FB (image processing). MSFT has championed the use of FPGAs (field programmable gate arrays), which can be programmed to address specific computing needs and sometime used in conjunction with a GPU. AMZN and BIDU have joined the FPGA bandwagon. XLNX is well positioned for this opportunity.

Finally, the top hyperscale operators are very open to alternatives to x86 for their CPU needs. ARM, whose processor architecture dominates mobile devices, has developed a schema for datacenter processor chips. QCOM, the leader in those mobile chips, has introduced its Centriq line of datacenter CPUs. Initial reviews suggest that Centriq may outperform XEON in high capacity scenarios, with superior price/performance and power efficiency. MSFT and BABA are early customers, and GOOGL and FB are evaluating it. Similarly, IBM’s new Power9 chips, with integrated encryption throughout processing, has interest, certainly from IBM’s own platform but also from GOOGL. We believe that these alternatives could capture 20-30% of CPU demand within 5 years.

It’s a Cloud/AI World

We believe that we are amidst a generational paradigm shift in computing, marked by the rise of massive hyperscale datacenter platforms in the cloud, running extraordinarily powerful AI-enabled software applications, for users on portable devices, connected over very high-speed wireless connections. (see http://www.ssrllc.com/publication/the-cloudai-era-a-perspective-on-the-next-decade-of-tmt-investing-2/) This is manifesting in the growth of massive franchise applications, like Google’s or Facebook’s many consumer services, in SaaS applications, like Salesforce or Microsoft Office 365, and in the rise of public cloud hosting platforms, like Amazon Web Services, Microsoft Azure and Google Cloud Platform. Already, cloud platforms make up almost a third of total datacenter processor demand. We expect the public cloud hosting market to grow faster than the 26% CAGR suggested by some industry analysts, sucking computing workloads away and discouraging investment in private datacenters. On this pace, more than half of enterprise computing will be in the cloud within a decade, along with a large majority of the investment (Exhibit 1, 2).

Exh 1: Public Cloud Services Market Revenue and Growth Forecast, 2016 – 2026

A big driver for this exodus is cost – we estimate that the best cloud platforms may have operating costs a magnitude lower than the average enterprise datacenter. These substantial cost savings come from many factors (Exhibit 3). The biggest is capacity utilization – for example, Google runs its datacenters at roughly 40% utilization, while the average enterprise manages less than 12%. Cloud platforms that buy server components directly also save from cutting out systems OEMs and their healthy margins and from negotiating leverage. Intel has nicknamed its top cloud customers – Google (its largest server chip buyer), Facebook, Microsoft, Amazon, Alibaba, Baidu and Tencent – the Super 7, giving them the first crack at its newest processors to test and buy. This reflects both their importance to Intel, but also their power.

Exh 2: Cloud Services Market and Hyperscale Growth Forecast, 2016- 2020

Exh 3: Basic On-Premise versus Cloud Cost Comparison

As such, continued exuberant investing on the part of the Super 7 (and a few others) may be offset, in part, by shrinking spending for private datacenters and by the confident negotiating stance of these few, powerful customers.

Intel Still Dominates

While a few other makers of datacenter-class processors have been getting attention – more on that later – Intel still dominates the market. Intel sold more than $18B in chips to the datacenter market over the past 12 months, 10+ times more than either long-time x86 rival AMD or GPU king Nvidia (Exhibit 4). In cloud datacenters, Intel’s share of x86 sales is believed to be much higher, as much as 98%, and while GPUs are obviously popular on these platforms, Intel’s total take of processor spending likely approaches 90%.

Why Intel? The major cloud franchises still run plenty of more traditional software, particularly for hosted applications. Even for applications infused with AI functions – e.g. predictive modeling, natural language processing, image processing, etc. – much of the software is still relatively traditional and those workloads will separate from the AI and run on CPUs rather than alternative configurations. Right now, x86 is the standard for most cloud computing and Intel has the highest performance chips. Indeed, the Super 7 jostle for position to be first to get shipments of each new CPU as it comes out – most recently, it was Google bringing up the Skylake processors a few months ahead of everyone else (including the commercial OEMs, like HP, that used to be first).

Exh 4: Intel Data Center Group Segment Financials, 3Q14 – 3Q17

Times, They are a Changin’

Intel and x86 may rule the cloud today, but the situation is not likely permanent. Hyperscalers are known for buying components directly and having board configurations manufactured to their specifications. Often, outsiders presume that the purpose for this is purely cost, and that Google, Amazon, Facebook, et al. are each building a single design in massive quantities to fill rack after homogenous rack in their datacenters.

This is a misperception. While cost is an obvious benefit, the most important aspect of the self-designed servers is to match hardware configurations precisely to the needs of each company’s biggest categories of computing workflows (Exhibit 5). Interpreting a spoken command may need two different AI models – one to recognize it and another to assess its meaning – but it also needs to search databases, execute calculations, manage communications, and other functions that fall into traditional computing categories. Typically, a hyperscale platform operator will maintain a dozen or so different configurations, each tailored to a specific workflow need. These configurations are reevaluated on a regular basis, with replacements designed, and systematically rolled out. Google estimates that it introduces at least 3 new server designs to its internal and external datacenters each year, with the typical design taken out of service within at least 3 years. It also confirms that its server diversity is increasing.

Exh 5: Global Data Center Workloads by Applications Forecast, 2015 – 2020

Within this, there are two distinct threat vectors for the x86 architecture. The first is the rise of AI. We have written extensively on machine learning technology and its use cases (AI Data – Food for Artificial Thought5G Hyperscale AI – 5G Sea ChangeAI-As-A-Service – Deep Learning is FundamentalThe Cloud AI Era – Perspective on the Next Decade of TMT Investing). From a processing hardware perspective, the important takeaways are that AI models favor very different processing architectures and that the demand for AI-tuned servers is growing very rapidly. The second threat vector is the openness of cloud operators to alternative processor architectures for more traditional computing and the rise of credible CPU options. We will assess both threat vectors in turn.

Parallel Processing for AI

Traditional CPUs, like the x86 architecture, are built for flexibility, with a broad instruction set and the ability to execute complex calculations of many different types as they are presented. This flexibility is important for traditional computing, where the CPU must handle job requests of many different kinds presented from many different applications. However, flexibility comes at a cost – because the processor must interpret computing commands before executing them, it wastes time if asked to execute a long chain of similar, simple commands. AI models, built of thousands of interlinked but simple algorithms, thrive on processor configurations that can rip through those long chains of similar commands (Exhibit 6).

Exh 6: Architectural Difference between Traditional CPU and GPU

It is important to note that AI models have two distinct phases of development (Exhibit 7). First, training, where the learning model iterates through mountains of data accumulating insight through the tiny self-adjustments of each algorithm, guided by the tweaks of developers, who can adjust the trajectory of the learning to better suit its objectives. In this phase, raw speed for simple calculations done in parallel is paramount, but with the flexibility to accommodate those tweaks. The second phase is inference, the technical term for a model that is in working operation. The model is still learning from new data, but it is taking inputs as they come and running them through a stable computational matrix. There are still many simple commands that need to be executed in parallel, but developer adjustments would be made off line and the processor may need to be refocused to other jobs as needed.

Exh 7: Phases of Development for AI Models

Exh 8: Traditional 2 Socket Server Platform Block Diagram

For a few years, computer scientists have used graphics processing units (GPUs) as co-processors to accelerate their AI model training. GPUs, originally designed to free the CPU from the many parallel vector calculations necessary to support digital graphics at video game speeds, play a similar role for AI. A UCLA researcher estimates that a GPU-focused server configuration may have as much as 50-1 cost advantage over a pure CPU design with the same performance on AI tasks (Exhibit 8). Nvidia was the leader in graphics-focused GPUs and was first to catch on to the potential of its architecture for AI. Nvidia developed a version of its GPUs specifically tailored to the needs of AI training along with a software platform that made it easy for developers to build models that made efficient use of the hardware platform. This vision and head-start has put Nvidia in an enviable position as a de facto hardware standard for AI model development.

A typical datacenter GPU board will have one or two low-level server CPUs helping to coordinate the activities of 8-16 GPUs (Exhibit 9). For example, Microsoft’s GPU board for its Azure platform and contributed to the Open Compute initiative can have as many as four CPUs to manage the 8 Nvidia Tesla GPUs, but notes that the two CPU configuration is “a high-performance supercomputer in a box for Deep Learning training jobs”. (N.B. the technical note specifies the Nvidia GPU but does not specify x86 processors) Other major hyperscale operators are believed to have similar designs, and would be free to copy Microsoft’s from open source.

Exh 9: GPU accelerated Server Platform Block Diagram