A Deep Learning Primer – The Reality May Exceed the Hype
SEE LAST PAGE OF THIS REPORT Paul Sagawa / Artur Pylak
FOR IMPORTANT DISCLOSURES 203.901.1633 /.1634
psagawa@ / firstname.lastname@example.org
May 23, 2016
A Deep Learning Primer – The Reality May Exceed the Hype
Deep learning based AI will drive the next phase of disruption for TMT. Leveraging hyperscale data centers and ubiquitous mobile devices, AI systems that allow computers to interpret ambiguous inputs and find optimal solutions are enhancing human-machine interfaces, enabling autonomous machines to execute complex tasks, and addressing previously intractable analytic challenges. However, the prerequisites for leadership in AI are rare. There may be as few as 50 true experts in deep learning, concentrated in a small number of organizations. Only a few hyperscale cloud operators have the computing power critical to addressing the most promising opportunities. Similarly, deep learning systems work best with massive data sets – a big advantage for big consumer cloud franchises. Finally, experience matters – learning systems improve iteration by iteration. The top player is GOOGL, which employs 40% of the world’s top AI data scientists and has dozens of initiatives based on the technology. MSFT runs a distant second, with FB, IBM, and AMZN further behind. AAPL has the most at risk, as it struggles to catch up in a technology that could threaten its ecosystem. A raft of startups, most tightly focused on a particular application, have emerged, and many more will come, particularly as core technology is contributed to open source solutions and as the leaders begin to offer deep learning platforms as a service.
“The impact of AI on tech is like the impact of electricity was on industry” – Andrew Ng, Chief Scientist, Baidu
- Deep learning is the next phase for the Cloud-Mobile Era. Deep learning technology is poised to threaten the app-focused smartphone paradigm, to revolutionize data analysis, to establish new systems control paradigms, and to enable entirely new classes of applications while displacing traditional services across the economy. In this, Deep learning will create larger, more immediate and more foundational business opportunities than other hyped innovations, such as VR, wearables and the IoT, while leaving TMT players without strength in the technology extremely vulnerable.
- Optimal solutions to ambiguous problems. We see four major vectors for deep learning systems development. 1. Allowing computers to cope with ambiguous input data, such as natural language, speech, raw images, or gestures. 2. Enabling the autonomous operation of machines – e.g. self-driving cars, drones, robots, or smart homes. 3. Performing highly complex analysis of very large data sets, thus supporting medical/scientific research, economic forecasting, crime/fraud prevention, resource allocation, and other significant analytic bottlenecks. 4. Enhancing personal services – anticipating user needs, offering insight, increasing personalization, adding context awareness, and performing tasks.
- What will change? AI could make the touch GUI/App interface model obsolete, not just by simplifying user input – voice assistants Siri, Google Now, Cortana and Alexa are all AI-based and can get MUCH better – but by carrying a user’s context across all devices, anticipating needs as much as responding to them. AI could dramatically improve applications – e.g. shopping applications that really know your tastes, customer service that can anticipate problems, CRM systems that can accurately qualify leads, analytics that cope with messy data and find unexpected insights, or quantitative models that adjust to changing conditions as they happen – and enable new ones that have yet to be articulated.
- The prerequisites for success are hard to gather. 1. Talent – The acceptance of the deep learning approach is relatively recent and there is still a dearth of experienced talent in the field – there may be as few as 50 real experts in deep learning worldwide. 2. Data – The more data, the better the AI. 3. Infrastructure – Deep learning is extremely computationally intensive, requiring access to massive computing resources. 4. Experience – Deep learning is a fundamentally iterative process without obvious shortcuts. These requirements greatly advantage a relatively small set of leaders. Smaller companies will either work with these companies, or will remain tightly focused on narrow applications.
- GOOGL – The giant of deep learning. #1 GOOGL likely has 40% of the top minds in deep learning, with hundreds of younger researchers working under them. They are attracted to an organization that has been working on AI from its inception and that offers the world’s most powerful computing infrastructure, an ocean of data both broad and deep, reach to more than a billion consumers, and a commitment by senior management to use deep learning to address inspiring problems. GOOGL leads in all four vectors of development, with major work long underway in natural language processing, image recognition, predictive search, autonomous machines, medical research, deep learning platforms as a service, and many other projects.
- The next 4, in order. #2 MSFT has long standing investments in deep learning – Cortana, Kinect, HoloLens, and other cutting edge product areas. It has a sizeable team able to leverage a strong computing infrastructure and an excellent base of data. #3 FB has been very aggressive, hiring experts of its own, even poaching talent from GOOGL. It has the platform and the data to support a major AI push, and has made deep learning “bots” the cornerstone of its strategy for monetizing messaging. #4 IBM is vocal about its Watson deep learning initiative and has AI pioneer Yoshua Bengio leading the effort. Its focus is necessarily more industrial, lacking the consumer data of its rivals. #5 AMZN is characteristically secretive about its research, but its recommendation engines, Alexa and drone program suggest significant investment. Obviously it has the necessary data and computing platform.
- AAPL – At risk? AAPL is playing catch up in AI, acquiring talent in deals for Siri, Perceptio, VocalIQ, and Emotient, and posting dozens of deep learning job openings. Its extreme secrecy, reluctance to capture user data, and focus on local device computing rather than cloud-based data centers are all significant impediments to progress, and its results to date with learning driven initiatives – e.g. Siri, maps, music recommendations, etc. – have been disappointing. We believe AI systems have the potential to subvert AAPL’s app-based interface model, to lessen the importance of the smartphone itself, and to erode the substantial switching barriers around the AAPL ecosystem.
- The others. BIDU poached GOOGL’s Andrew Ng to lead its deep learning efforts, and is considered ahead of Chinese rivals BABA and Tencent. All three benefit from an aggressive government initiative to develop deep learning talent in the Chinese university system. Meanwhile, a raft of deep learning startups have launched, typically with a sharp vertical or functional focus that minimizes the disadvantages of their scale. Many of these small companies have been acquired in recent months, for their talent as much as their products.
The Next Big Thing
There are a lot of “next big things” in TMT. If it’s not the Internet of Things (IoT), it’s Virtual Reality (VR) or wearables. The hype makes us cynical – Google Glass, the Apple Watch, and Oculus Rift have fizzled on our watch. Deep Learning may feel like another of these “next big things”, more sizzle than steak, a disappointment waiting to happen. We think not. Deep learning based AI has been growing on us for a long time, and has enabled innovations like GOOGL’s search algorithms, MSFT’s Kinect, FB’s news feed filters, IBM’s Watson and AMZN’s Alexa. We believe that the ubiquity of smartphones, the reach of fast wireless, and the power of hyperscale cloud data centers opens the door to AI-based service paradigms that could have a bigger impact than the rise of the World Wide Web.
Deep learning systems are getting better and better – quickly. Talent is in short supply, but it is concentrated in a few very serious organizations. Hyperscale data centers are a needed ingredient – building good deep learning requires a lot of horsepower. The unprecedented growth of smartphones fills another need – deep learning needs a lot of data from which to learn. Finally, deep learning systems are iterative – the longer you work on them, the better they get – so experience is the final ingredient.
A few companies have put together the recipe. GOOGL is the undisputed champion, with 30-40% of the top deep learning minds in house, the planet’s biggest and most sophisticated data center infrastructure, extraordinary rivers of data, and dozens of teams that have been hard at work for years on audacious projects like autonomous vehicles, systems to anticipate user needs, and research to stave off aging. MSFT has a strong history in deep learning, and can lever its strong position as an enterprise cloud provider. Data rich FB is aggressively moving to make up ground – poaching experts from GOOGL, collaborating with the open source community, and pushing customer service “bots” as an alternative to apps. IBM has pushed many of its remaining chips behind its Watson deep learning bet – it has expertise but a weaker platform and data position than the other leaders. AMZN is very secretive, a potential weakness in the small and close knit deep learning academic community, but has made clear progress.
These companies, and a few others – i.e. the Chinese (BIDU, BABA and Tencent, and a raft of narrowly focused startups (likely acquisition fodder) – are chasing huge potential opportunities. Cloud based systems that take natural language inputs, anticipate needs, and follow users across contexts could minimize the current app-and-menu device focused interface paradigm. Autonomous machines could revolutionize whole sectors of the economy – not just via self-driving cars. Deep learning could turbocharge big data analytics – helping to find new drugs, predict and eliminate product flaws, qualify sales leads or root out fraud. Computing becomes easier, faster, proactive, able to cope with ambiguity, and better able to find the best answer, improving with every iteration.
This is, of course, a risk for companies on the wrong side of the deep learning divide. Along with the usual suspects – traditional software firms like ORCL and SAP and wide swaths of the analog economy – we see AAPL as facing substantial threats. A new human-machine interface paradigm could greatly weaken the primacy of the smartphone and the switching barriers for the AAPL ecosystem. Of late, it has stepped up its game, making acquihires like Perceptio, VocalIQ, and Emotient and listing dozens of deep learning job offers. Still, prior efforts, like Siri, Maps and Music, have been unimpressive, and rivals like FB, AMZN, MSFT and especially GOOGL seem much, much better positioned.
What is Artificial intelligence?
The inventors of electronic computing were fascinated by the idea of building machines that could think like humans. Alan Turing, the famed British codebreaker and computing pioneer, posited the “Turing Test” in 1950, setting the bar that humans should not be able to detect that an intelligent computer communicating with them was a machine. In 1956, John McCarthy, then an MIT junior faculty member and later to be the inventor of the Lisp programming language and a revered computer science professor at Stanford, coined the term Artificial Intelligence in convening the first global conference for the leading scholars on the topic. Since then, scientists have chased the dream while science fiction, from “Star Trek”, to “2001: A Space Odyssey”, to “War Games” and “The Terminator”, has explored the fascinating implications of computers that think (Exhibit 1).
Exh 1: Timeline of Select Artificial Intelligence Milestones, 1950-Present
AI researchers seek to surmount three major challenges that separate traditional computers from human thought. The first is searching – the computer must be able to find and classify the most relevant data as quickly as possible, separating valuable information from worthless noise. The second is reasoning – the computer must use logic and probability to work out the best possible response from many plausible ones rather than simply finding a deterministic answer. The final challenge is learning – the computer must adapt to new data and improve the quality of its responses based on context.
One of the earliest concepts in AI was creating electronic analogs of the neurons that make up the human brain, mimicking its pattern recognition and inference mechanisms. While conceptually elegant, this approach was generally viewed as a dead end. The human brain has around 100 billion neurons, while early computers counted their transistors in the thousands (Exhibit 2). While a small core of idealistic researchers continued on the “neural networking” path to AI, most computer scientists shifted to a more deterministic approach that became known as “expert systems”.
Exh 2: Moore’s Law – Processor Transistor Counts Over Time
Expert systems jumpstarted the learning process by employing human subject matter gurus to program in decision trees. For example, chess playing computer programs would have established rules for executing moves codified from a bank of data consisting of the moves made by grandmasters when presented with similar circumstances. The quality of the expert system depended on the knowledge base provided by the human experts and the ingenuity of the programmers to anticipate possible decision scenarios. Necessarily, both of these things are limited, and so are expert systems. For very complicated applications, such as understanding human language, categorizing huge bases of images and videos, or curing cancer, the expert system approach is wholly inadequate. Expert systems are unbeatable at Checkers, can compete at the Grandmaster level at Chess, but fail miserably at the Chinese game “Go”.
While the computer science world was more interested in pushing expert systems to their limits, a small cadre of true believers continued working on neural networks. Centers of expertise arose at certain universities – The University of Toronto, The University of Montreal, Oxford, University College London, The University of Lugano, New York University and Stanford University stood out, mostly because of a small handful of pioneers had their academic residence at these schools (Exhibit 3). Even at those institutions, neural networking was viewed as a dead end, and, according to some of the latest generation of leaders, promising computer science Ph.D. candidates were gently advised to consider alternative areas to study.
Exh 3: Academic Centers of Excellence in Artificial Intelligence
After the millennium, the circumstances began to change. The limitations of expert systems became their own sort of dead end, while the rise of the hyperscale cloud data center architecture with nearly limitless available computing power started to make neural networks much more viable. Neural networking was rebranded as “Deep Learning”, focused less on the anthropomorphism of the approach, and more on the iterative way the algorithms could sift through massive data bases looking for patterns.
How Does Deep Learning Work?
Deep learning systems sift through data, using layers of software that look for patterns on successively more detailed levels. On each pass through the data, the program is given feedback on the quality of its choices at each level, feedback that informs the algorithm to make better choices on subsequent iterations. This process is called reinforcement learning, and it is roughly analogous to Pavlov’s famous experiment with dogs and their response to positive and negative reinforcements (Exhibit 4). As decisions at the most basic levels get more accurate, higher levels that depend on the lower level distinctions start to get better as well. For example, consider a system intended to identify the faces of specific people in pictures. The first layer may be looking for reoccurring instances of pixels with similar tone in roughly oval shapes. Another layer may infer that because of two anomalies within the oval shape that often occur above a protuberance from the oval, some ovals are faces and some are not. Another layer figures out that sometimes the oval is turned to profile such that there is only one “eye” visible, but the “nose” becomes more prominent. Another layer realizes that some faces are human, while others are cats, dogs or other animals. Eventually, the system learns to differentiate the face of your Uncle Joe from his older brother, Uncle Bob – their faces are slightly differently shaped. At some point, the learning system can tell Uncle Joe from his twin brother, Uncle John – even relatives have a hard time, sometimes – noting that Uncle Joe’s eyes are somewhat wider spread and Uncle John’s nose is crooked from a bike accident 30 years ago (Exhibit 5).
Exh 4: The Basic Reinforcement Learning Scenario
Exh 5: Hierarchy of Deep Neural Networks
Learning about human faces at that level of accuracy may take a deep learning program many millions of iterations on many millions of tagged photographs with thousands of algorithmic layers, but once learned, the program will be able to accurately identify the faces of confirmed individuals – even in blurry photos, with poor light, and a profile view. This is the kind of AI that beats Go champions, translates spoken Mandarin Chinese to English in real time, and drives cars more safely than humans. Top computer science grad students are lining up for the top deep learning programs, with visions of seven-figure starting salaries and $100 million startup acqui-hires dancing in their heads.
We believe that the building blocks are finally in place to for deep learning systems to revolutionize most of the TMT landscape, and indeed, much of the rest of the economy (Exhibit 6). A new data center paradigm, pioneered by Google and emulated by the other “Hyperscale” operators, enables almost infinitely scalable computing power at dramatically lower costs. Smartphones have proliferated to nearly worldwide ubiquity, allowing the collection of massive data sets and reach to billions of potential customers for cloud-based AI applications. Finally, the steady improvement of wireless networks, enhanced by WiFi, connects it all together at speeds and costs that work (Exhibit 7).
Exh 6: The Building Blocks of Artificial Intelligence
Exh 7: Wireless Use Cases, 4G and Beyond
What is it Good For?
The stakes are high. We believe that deep learning capabilities have the potential to improve nearly every current application for computing, while enabling wholly new ones that could disrupt major sectors of the economy. A simplistic look at the emergence of the World Wide Web begins with the introduction of the PC back in the early ‘80’s, follows through the spread of Ethernet networking, and the rise of powerful and open clustered server data centers. As more and more people got PCs and connected them to servers, each step strengthened the next, eventually enabling an ecosystem that would support a consumer internet.
AI is poised to make a similar leap. Smartphones have grown to near ubiquity in a blink. Wireless options expanded their reach and performance to serve them. Hyperscale data center architecture made it possible to serve the massive user base with apps that could scale to the billions. Deep learning had been on the periphery of the mobile-cloud sea change, but a similar chain of enablement, like the one that spawned the World Wide Web, has led to a sturdy platform ready for AI bring on an entirely new set of application paradigms (Exhibit 8).
Our framework for assessing the potential for AI applications by looking at four major use cases – natural interfaces, autonomous control, highly complex analysis, and personalization/prediction (Exhibit 9).
Exh 8: Enablement Chains of the PC and Mobile/Cloud Generations of Computing
Exh 9: Use Cases for Artificial Intelligence
Alphabet CEO Larry Page’s dream was to create a computer like the one in Star Trek, that could understand the meaning of questions as they were spoken and in context. Understanding human speech, with its subtleties, connotations, and imperfections, is a very difficult task, but a perfect one for deep learning systems to tackle. The other half of the Turing Test, responding in a natural manner that is indistinguishable from a human conversational partner, is the obvious counterpoint. The final frontier would be perfect instantaneous translation between languages, a task that is extraordinarily difficult, even for expert humans. Deep learning systems are already delivering excellent work on text translation and have shown dramatic progress with spoken translation. After having previewing English-Mandarin translation as a scripted demo in 2011, Microsoft introduced simultaneous translation for Skype at the end of 2014, supporting 6 languages, including English and Mandarin. Reviews suggest that the service is acceptable for very simple conversations – a substantial triumph, given the complexity.
Natural language is the key to “bots”, programs that mimic humans to interact with them. Facebook and Microsoft have introduced platforms for 3rd parties to build bots to interact with consumers for customer service and advertising applications. Users would initiate “chats” with the bots, asking questions without regard to format and getting appropriate answers and actions as though they were given by an actual representative. Obviously, automated call systems have used rudimentary voice recognition and pre-recorded answers for years, but the new bots have the potential to be far, far more flexible, accurate, and human-like in their interactions. Imagine a “lawyer bot” which could handle almost any consumer legal inquiry, or a “doctor bot” which could be a first line triage for medical consultation.
Images are even more nettlesome for computers than language. Traditional machine vision, central to much precision manufacturing, focuses on a small set of known and exact parameters – the edges of a part, the socket for chip, etc. – and does not rely on AI to do its job. Opening that vision to the unexpected requires interpreting many millions of pixels, looking for patterns. The images can be static – identifying the contents of a photograph. They can be moving – looking for anomalies in a surveillance video. They can be live – using facial recognition for identification at a point of sale.
The uses of image analysis are myriad. Categorization for search and archiving. Identification for security or connection. Interpretation for gesture controls. These functions may be an end to themselves, such as in Google Photo or Facebook’s automatic labeling of the people in a photograph, or a part of a much more complicated product, such as an autonomous robot, a point-of-sale security solution, or an augmented reality system.
Deep learning AI can allow computers to make well considered decisions about complex and ambiguous situations, making choices to optimize the likely outcome given the information at hand. It is this aspect of the technology that allowed Alphabet’s DeepMind reinforcement learning system to beat a world champion at the game Go – a milestone that many in the AI world thought was a decade away. It is this sort of system that is learning to drive autonomous cars, control industrial robots, fly delivery drones, manage “smart” homes, and run warehouse logistics or manufacturing lines without human intervention.
This has enormous implications across the economy. The potential impact of self-driving vehicles is staggering – replacing long-haul truck drivers with autopilots that can drive with perfect safety without rest could dramatically reduce accidents along with transportation costs. Subscription personal taxi services with autonomous cars could be a safer, cheaper and more convenient alternative to car ownership – helping to unsnarl traffic and freeing up parking spaces for alternative uses along the way. Robots could replace workers in hazardous jobs. Drones could deliver packages.
Autonomous control systems are viable now, but could be even more powerful combined with the proliferation of inexpensive sensors and low power 5G wireless connectivity envisioned for the “Internet of Things”. Municipalities could use learning based systems to manage traffic signals dynamically. Autonomous vehicles could communicate with each other and with the road to travel in highly efficient convoys. Energy use could be tightly managed within facilities without inconvenience to the people that live or work there. The possibilities are nearly endless.
Highly Complex Analysis
Big data analysis tools have revolutionized business intelligence, yet for the biggest, knottiest problems – drug research, epidemiology, economic forecasting, climate modeling, crime prevention, etc. – the number of variables and the complexity of their interrelationships, the quality of the data sets, the enormous range of potential outcomes, and the influence of human bias can leave even highly trained experts at loggerheads. Learning systems are relentless at mining insight from data, iterating and adjusting many millions of times to reach optimal solutions.
Drug discovery is a headline use case for deep learning. A 2015 research paper from Alphabet and Stanford scientists posits significant improvements from using deep learning to find effective molecules relative to the traditional human guided approach. Every major pharmaceutical company is looking to hire deep learning scientists, dozens of start-ups have been launched looking to leverage the technology for drug discovery, and projects within Alphabet, Microsoft and IBM are building AI tools that could be used by the industry on their platforms. Beyond pharmaceuticals, deep learning could find many other use cases within health care – e.g. triage systems, diagnostic support systems, test screening, genetic research, epidemiology, etc.
The financial industry is also ripe for AI based disruption. Learning systems could score credit, root out fraud, price loans, value assets, analyze portfolios, trade securities and model the economy. IBM has sold a pattern analysis solution based on Watson to several police forces for solving and even anticipating crimes. Essentially, in any circumstances that require decisions to be made from an avalanche of seemingly ambiguous data, deep learning could dramatically accelerate progress and drive potentially game changing innovation.
Deep learning systems are extraordinary tools for solving big problems, but also potential game changers for addressing small problems as well. Understanding the patterns of interests and behavior across a universe of billions of users can give applications insights to better personalize service to each individual. Recommendation engines work on this principle – deep learning makes recommendations ever more likely to hit their mark. Deep learning also greatly sharpens ad targeting for both interest and context, a win-win-win for consumers, advertisers and ad networks.
Deep learning lets applications understand a particular user’s habits, allowing greater accuracy in responding to their requests and interpreting their intentions. Imagine an autocorrect that already knew that you refer to your sister by an unusual nickname or a reservation service that had a great handle on the sort of restaurants that you usually like. Google Now has begun to use AI to anticipate things for which you might want to search – automatically bringing up cards to remind you of upcoming concerts by bands you like, to suggest that you leave for the airport a little early to allow for particularly bad traffic and an on-time flight, or to call attention to a just published article on a topic of special interest.
Deep learning could also enhance services outside of mobile apps. Health care could be revolutionized with continuous monitoring and analysis of vital signs and with doctors using diagnostic tools personalized to the patient based on their medical history and genome. Financial planning and other personal financial services could be responsive to the specific needs of the customer amidst changing conditions without human intervention. Online education could track curriculum to the knowledge base of the student to assure better outcomes in less time, measuring achievement by knowledge gained rather than hours spent.
AI as a Service
Deep learning platforms will also be available as a hosted cloud service. Specialized AI-optimized computing based on Nvidia GPUs is already available from Amazon Web Services and Microsoft Azure (Exhibit 10). Microsoft and IBM offer a full AU development platform as a service. Google Cloud Services may top them all with a planned IaaS/PaaS capability based on its proprietary Tensor Processing Unit (TPU) ASIC hardware, and its open sourced TensorFlow application development platform. Beyond the big data research applications listed above, we believe that many more mundane business applications can be dramatically improved with AI. Lead generation in a CRM system, customer service, inventory management and logistics, product and process design, reservations systems … the possibilities are nearly endless. Beyond the top AI names – i.e. Alphabet, Microsoft, IBM, Facebook, and Amazon – most enterprise application vendors and their customers will likely depend on hosted platforms.
Exh 10: Current Artificial Intelligence-as-a-Service Offerings by Company
What are the Keys to success?
There is a rising buzz around deep learning technology, as companies from all parts of the economy hustle to position themselves. Drug companies and banks are competing with Silicon Valley to recruit students from AI graduate programs. VCs are funding wisps of ideas, as long as there is deep learning talent at the helm. Established TMT companies are acquiring many of these startups, more to grab the scientists than their particular products. Within all the activity, we believe that there are four critical success factors for the companies looking to lead the deep learning revolution. It will take the right talent, the right data, a powerful computing platform, and time/experience to deliver these solutions as commercial reality. These ingredients are still in very short supply.
The list of true experts in deep learning is still a short one – until a decade or so ago, few of the top computer science scholars were interested in what was still viewed as an impractical approach. This kept the population of Ph.D.s versed in the science of neural networks limited. Yoshua Bengio, late of the University of Montreal and now of IBM, believes that there are only 50 or so true experts in deep learning. Most of those leading lights have been associated with six institutions that have been at the forefront, each cluster established by the presence of one of deep learning’s pioneers – The University of Montreal (Bengio – IBM), University of Toronto (Geoff Hinton – Google), Oxford/University College London (Demis Hassibis – Google), Stanford University (Andrew Ng – Baidu), New York University (Yann LeCun – Facebook), and The University of Lucerne (Jurgen Schmidhuber). Of course, the spigot for deep learning Ph.D.s has opened a bit, but it is still limited by an inability to keep faculty from jumping to industry themselves. Competition for the top brains is fierce, with some new Ph.D.s commanding seven figure guarantees and aqui-hire M&A at premium prices for the startups launched by academic stars.
We made a list of more than a thousand deep learning, machine vision and natural language processing scientists from company rosters, attendance at academic conferences, and papers published in peer edited journals. We then filtered them based on the number of times their research has been cited by other scholars writing in the field. 481 different scientists had been cited at least 1,000 times, an indication of the influence that their thinking has had in the artificial intelligence community (Exhibit 11-12). A full 43.2% of these experts work for Alphabet, 21.4% for Microsoft, 12.7% for IBM, 9.1% for Facebook and 5.4% for Amazon. Looking at total citations, the Alphabet and Microsoft remain at the top, but Facebook passes IBM for 3rd, a relationship that is even more pronounced if we look specifically at recent citations since 2011.Breaking the list by sub-specialty, Alphabet leads all three categories – deep/machine learning, machine vision and natural language processing – with a mix that is fairly consistent with the overall population. In contrast, IBM is unusually skewed toward language processing (Exhibit 13).
Exh 11: Artificial Intelligence Scientists by Organization and Number of Citations
Exh 12: Artificial Intelligence Expertise by Company
Exh 13: Artificial Intelligence Specialty Mix by Company
Learning systems demand data. Enough to let the algorithms establish subtle, but important relationships within a sea of variables. Enough to give the program opportunities to cope with the unusual but not impossible circumstances that turn up in real life. Enough to build statistical confidence across a very wide range of outcomes. It seems odd today, but a decade ago, there was a relative dearth of data with which to train deep learning systems.
Now we live in a golden age of data. With more than 3 billion internet users, with smartphones in the hands of nearly 2 billion people worldwide, with popular app platforms like Facebook, Google Search, YouTube, and WhatsApp topping more than a billion monthly repeat users, and with billions of daily videos viewed, photos posted, news stories shared and messages sent, there is plenty of fodder for deep learning systems to find insight. Not only that, but deep learning may be the only technique that really allows the companies that have that data to make usable sense of it.
Exh 14: Users and Available Data by Company
The companies with the biggest troves of data are well known (Exhibit 14). Alphabet, with its constantly updated index of the entire Internet, its massive archive of YouTube videos, its record of every search and video view, its petabytes of carefully vetted global maps with to the minute traffic updates, its years of stored Gmail messages and Play Store purchases, its detailed record of more than 2 million miles of autonomous driving, and other stores of useful demographic, location, interest and usage data, is the king. Facebook’s social graph – demographic, social connection and interest data tied to real names and email addresses, hundreds of billions of stored photos and videos, a daily flow of tens of billions of private messages, and other measures and interactions of its 1.5 billion user base carefully cataloged for analysis – is nearly as broad and extensive as Alphabet’s. These two have data resources far, far ahead of would be competitors, with possible exception of their Chinese counterparts, Baidu and Tencent.
Amazon and Microsoft have large and valuable data collections, albeit from smaller user bases. Amazon has a bit over 300 million active customers. Microsoft has more than a billion Office users, and will gain more data about their usage as the shift to the cloud-based Office 365 plays out. Moreover, both Amazon and Microsoft are looking to leverage their expertise in deep learning in tools for the enterprise customers of their IaaS hosting businesses, an approach that Alphabet is also pursuing. IBM, with little data of its own, has made a significant head start in applying its Watson AI to customer data. It boasts a range of successful deployments in disparate fields, like industrial operations, crime analysis and medical research.
Apple’s aggressive support for user data privacy puts it at a substantial disadvantage vis a vis deep learning systems. It may have a billion users of its platforms, but has purposely limited its own access to the usage information for its customers. This policy limits the application set against which Apple can apply its deep learning expertise. Still, Apple could amend its stance or encourage its users to opt into a data sharing arrangement.
Deep learning algorithms require a lot of processing power. Grinding through massive data bases looking for patterns, iterating through layer after layer of algorithms to fine tune the insights, then retuning the algorithms and grinding through the data again. This was the biggest impediment to neural networking and deep learning in their earlier days. No university or corporate research department had access to enough computing capacity at a low enough cost to really put the emerging technology through its paces.
Well, now they do. Alphabet, Microsoft, Amazon and IBM have begun to battle for leadership in providing deep learning platforms as a cloud service (Exhibit 15). In the commodity IaaS cloud hosting markets, pricing for basic computing and storage services has plummeted, essentially undercutting the cost for building in house capacity. The top commercial platforms, AWS, Microsoft Azure and Google Compute Engine, have been adding graphics processor (GPU) capacity, an architecture that has proved particularly efficient for Deep learning applications. This specialized hardware combined with high performance learning software platforms will give the leading IaaS operators differentiated, value-added services to be bundled atop their high performance, low cost and almost infinitely scalable infrastructure.
Exh 15: Hyperscale Cloud Infrastructure Services Market Share, 4Q15
To date, Amazon has been dominant in basic cloud hosting, with over 30% share of the market and enviable 24% operating margins, but Microsoft is growing nearly twice as fast as a still distant but capable number two. Alphabet has seriously underperformed as a commercial cloud host – it is less than a 10th the size of AWS – but substantial planned investment in expanding the reach of its commercial platform to 17 global data centers and a new commitment to the IaaS business, backed by the hiring of VMWare founder Diane Greene to run it, could see it flexing its deep learning muscles as a host. IBM’s Watson cloud hosting platform does not a have a massive consumer cloud operation alongside it to drive learning, scale and cost efficiency. Still, next to Microsoft, it has the best positioning with potential enterprise customers for future deep learning hosting business (Exhibit 16-17).
Facebook, of course, has its own hyperscale data center infrastructure dedicated to its own purposes. Based on the record of capital spending and on the energy efficiency of its data centers, we believe Facebook trails only Alphabet, Amazon and Microsoft in the scale of its operations and the sophistication of its architecture. Apple has been spending strongly of late, but the company still struggles with its data center performance. Recently, it moved a substantial portion of its iCloud operations onto Google’s commercial cloud platform, a decision which was likely as necessary as it was painful. This speaks poorly for its ability to lever its data centers for deep learning applications.
Exh 16: Net Plant Property and Equipment, 2010-2015
Exh 17: Capex Spending, 2010-2015
Training deep learning systems takes time. With each iteration, algorithms are tweaked, new layers are added, additional data is included, and the solution gets better. For example, Google’s self-driving car project now relies on 8 years of experience, with more than 1.5M miles of road testing, and billions of miles in simulation, to conduct its vehicles safely and efficiently through real world conditions. Still, the leaders of the project see 3-4 years of further iterations at even greater intensity, before the software is ready for fully autonomous commercial operation under the most taxing potential conditions. Other companies, looking to follow Google, will find it very difficult to make up the multi-year head start, even if they could bring the same talent, data and processing to the table (Exhibit 18).
Given that experience really matters, who has it and in what context? Once again, the list starts with Alphabet. Early Google was a pioneer in using learning algorithms to sharpen its core search service, using it to interpret natural language search queries, to translate across languages, and to recognize speech commands. As it added images and video to search, it began serious work in using deep learning to categorize and interpret these data objects as well. The core work on generalized deep learning became TensorFlow – an AI platform that was contributed to the open source committee and offered as a service on Google’s commercial hosting platform. Behind its own doors, Alphabet continues to refine TensorFlow with ongoing research iterations. With this platform, Alphabet has years of work applying it to high value applications, like the aforementioned autonomous vehicles, natural language processing, speech recognition, and image/video recognition, but also including predictive search, robotics, autonomous controls, and medical research. This bank of experience is a powerful asset.
Microsoft, by virtue of playing catch up in search, has been at the deep learning game for a long time as well, with its own strength in natural language processing, speech recognition, translation, image/video recognition and other related application areas. In addition, work on the Kinect system for the Xbox gaming platform puts Microsoft in the lead on gesture interpretation and leads into the live video interpretation capabilities of the fledgling Augmented Reality HoloLens initiative. IBM also leverages a significant history in learning systems, in particular, for natural language processing – a key element of the Watson platform.
Google, Microsoft and IBM had each begun work on deep learning before Mark Zuckerberg had even begun to write code for “The Facebook” as a Harvard sophomore. Facebook’s big splash in deep learning came with the 2013 hiring of deep learning guru Yann LeCun and a crew of acolytes from NYU. The operation was dubbed Facebook Artificial Intelligence Research (FAIR), and it focused heavily on image recognition, the key driver of LeCun’s prior research. Given Facebook’s massive archives of photos and videos, this strength is a perfect match for the company’s priorities. Of late, Facebook recently introduced a platform for commercial “bots” – AI agents that can carry on simple written conversations with humans – looking to them as customer service vehicles for advertisers on its messaging platforms.
Amazon and Apple were also late comers to AI, and much of their work has been shrouded in secrecy. Amazon had dabbled in learning software through its recommendation engines, which seek to offer products to customers based on their prior purchase history, and jumped in more officially with the introduction of its Alexa intelligent assistant in 2015. Alexa, bundled with the Echo smart speaker and some models of the Kindle Fire, is more notable for its ease of use and the momentum behind its ecosystem of partners than for the sophistication of its voice recognition. Amazon Web Services has hosted learning applications for years – support for the graphics processor instances (GPU) favored for AI has been available since 2013 – and the company began to offer a basic deep learning platform to 3rd parties last year. Arguably, AWS’s learning technology is not state of the art, and many customers, including Netflix, run their deep learning apps on AWS, using 3rd party software (such as Google’s open sourced TensorFlow).
Exh 18: Timeline of AI Initiatives by Company, 2000 – Present
Apple’s experience in machine learning is limited. Siri, the intelligent assistant program bundled with iOS, is even more limited than Amazon’s Alexa in its range of response, and has a spotty reputation for the accuracy of its voice recognition. Apple’s recent AI acquisitions – VocalIQ and Perceptio – would seem to support further work in natural language recognition (Exhibit 19). We also note strong rumors of Apple’s interest in self-driving automobiles, a major area of industry wide AI research, but one in which there is no record of Apple’s previous involvement.
Exh 19: Notable AI Acquisitions since 2010
Alphabet, Microsoft, Facebook, and IBM, in that order, are the top companies in AI. Amazon and Apple are playing catch-up, but have been badly hampered by their secrecy – the top minds in deep learning expect to be allowed to continue teaching part time and to keep publishing. The top Chinese internet companies, Baidu, Tencent and Alibaba, have dramatically accelerated their investment in deep learning, aided by a government push to turn promising engineering students toward the technology. Some second tier US internet players have also been investing heavily – notably Twitter, Uber, and Yahoo. Finally, a flotilla of deep learning startups has been funded by venture capital firms – most of them are tightly focused on specific and often vertical applications, but a few are the sort of academically oriented think tanks that have been catnip for the M&A departments at Alphabet and Facebook.
#1 – Alphabet
“Artificial intelligence would be the ultimate version of Google. The ultimate search engine that would understand everything on the Web. It would understand exactly what you wanted, and it would give you the right thing. We’re nowhere near doing that now. However, we can get incrementally closer to that, and that is basically what we work on.”
Alphabet CEO Larry Page, October 2000
Alphabet is the global leader in deep learning by a huge margin. It has nearly a thousand scientists working on AI projects, and has amassed 25% of the top echelon of academic talent working in the field. It has massive bases of data, levering its billion-plus user franchises (Search, YouTube, Chrome, Gmail, Android, Maps and Play) and its aggressive investments in other data rich projects (self-driving cars, Google Photo, etc.). Alphabet invented the modern hyperscale data center architecture, and its hardware and software infrastructure remains generations ahead of all rivals in its sophistication, cost efficiency and ability to scale in both power and geographic reach. Finally, Alphabet has been betting on learning-based AI techniques from nearly its beginning, and has logged the most scientist/hours in the industry on a long list of important AI projects.
Talent – In 2012, AI pioneer and computer science icon Ray Kurzweil joined Google to work on projects in machine learning and language processing. In 2013, Google acquired DNN, bringing with it Dr. Geoffrey Hinton of the University of Toronto, considered by many to be the world’s leading expert on deep learning. In 2014, it acquired Deep Mind, a British think tank with 37 top deep learning scientists, including founders Demis Hassibis, Mustafa Suleyman, and Shane Legg. Later that year, it added Dark Blue Labs and Vision Factory, hiring with them the core of Oxford University’s deep learning faculty. Respected deep learning scholar Fernando Pereira, once chair of the University of Pennsylvania’s computer science department, is now the head of research. Google Fellow Jeff Dean, the acknowledged father of the company’s unstructured database architecture, has led an AI project known as Google Brain since 2011.
Google scientists account for about 35% of all academic citations in machine learning, and almost half of all citations for commercial entities, far ahead of number two Microsoft with 18% of the total. Alphabet’s research rolls 67 scientists with at least 5,000 academic citations, 208 with at least 1,000, representing 43% of all scientists with that many citations. Its lists nearly 1,000 engineers with AI credentials working on more than 100 different project teams within the company. Alphabet’s Geoff Hinton, with more than 125K citations is by far the most cited scholar in the field. Based on comments from IBM’s Yoshua Bengio, himself one of the 3-4 most important deep learning scholars, 12 of the top 50 minds in the field were working for DeepMind at the time that Google bought it. Alphabet is far and away the #1 AI organization in the world (Exhibit 20-21).
Data – Google Search has more than 1.5B monthly unique visitors who execute more than 1.2 trillion searches a year. Google maintains the life long search history for each unique IP address, many of which it can link to a registered Google account. Many of those Google accounts belong to the more than 1.4B Android users, of which more than 1B have active Google Play Store accounts, which require a form of payment and a real name, and track the various apps that have been downloaded and used by each account. There is some overlap with the more than 1B global gmail accounts, for which Google maintains a record of every email sent or received until they are specifically deleted by the user.
There are also a billion users of Google Maps, at least 100M of whom are Apple iOS users – Google knows where these users have been, where they are, and where they are thinking of going. It also has its data base of digital maps, believed to have already been more than 20 petabytes in size back in 2012, detailing nearly every passable road and walkway in the world. The self-driving car initiative has logged more than 1.5M road miles in the vehicles, collecting a trove of data for every minute of operation. There are a billion chrome users, and Alphabet can track their aggregated web site activity. Finally, the 1.3B regular users of YouTube collectively upload 300 hours and watch 75,000 hours of video every minute of every day. Google knows which viewers are watching which videos how often, which ones they abandon and which ones they watch through, where they pause and when they rewind.
Exh 20: Alphabet’s Top 10 AI Scientists
Exh 21: SSR AI Scorecard: Alphabet
Hyperscale Processing – Google invented hyperscale processing. We have written at length about its data processing innovations and advantages (http://ssrllc.com/publication/goog-a-particular-set-of-skills/, http://ssrllc.com/publication/infrastructure-as-a-service-the-race-wont-go-all-the-way-to-the-bottom/). Other hyperscale data center operations are built on Google’s ideas, contributed over the years to the open source community, and remain generations behind in sophistication. Its data centers are the most efficient. Google’s software innovations allow it to parse data more quickly and to process it more flexibly. Recently, it announced that a custom ASIC processor specifically designed for AI called the Tensor Processing Unit (TPU) has been implemented in its data centers and will be available to 3rd parties on the commercial Google Cloud Platform. It has invested most of its $30B in PP&E in building out its data processing infrastructure, and continues to outspend its rivals in expanding its capabilities, with nearly $10B in 2015 capex. It has, almost inarguably, the best and biggest computing infrastructure on the planet. How nice for the deep learning scientists who get to use it.
Time – Alphabet began working on deep learning a long time ago. As noted previously, learning concepts figure heavily into the development of the core Search technology – e.g. interpreting mistyped or ambiguous queries, anticipating entries with just a few characters, recognizing voice queries, and automatically translating between languages were all introduced before 2010 and all depend upon deep learning techniques. These natural language chops are embedded in the just announced Google Assistant, Google Home and Allo Messenger. Alphabet also has long experience in image analysis – Google Photos, now with more than 200M users, relies on state of the art image recognition to allow users to search their picture archives for specific people, places and things without the need for pre-set tags. Google patented a similar approach for identifying objects in videos in 2012, and uses it to identify copyrighted or inappropriate content in YouTube.
Google has been working on self-driving cars since 2009 – a lifetime compared with the very recent entry by most of its would be competition. It has been working on medical applications since at least 2013, first as part of the Google X division, and then as Verily, one of the “other bets” owned by Alphabet. It has been working on robotics since buying most every important startup in the field in a late 2013 M&A spree. Google has also been a leader in offering its deep learning platform to 3rd parties, contributing its TensorFlow software to open source in 2014 and supporting it as a service on the Google Cloud Platform. Collectively, this is the most valuable set of prior AI work in the industry.
“I believe over the next decade computing will become even more ubiquitous and intelligence will become ambient. This will be made possible by an ever-growing network of connected devices, incredible computing capacity from the cloud, insights from big data, and intelligence from machine learning.”
Microsoft CEO Satya Nadella – 2015
Microsoft has been working on artificial intelligence for a long time – competing with Google on search leaves you little choice – and ultimately, that experience tips the scales vs. close #3 Facebook. The published work by its scientists covers a wide range of deep learning topics, with particular strength in natural language processing and machine vision, expertise that shows in its Cortana voice assistant and Kinect gesture recognition system. Combined with a very strong hyperscale data center infrastructure and data resources as good as any company that is not Alphabet or Facebook, Microsoft is well positioned to benefit from the emerging deep learning era.
Exh 22: Microsoft’s Top 10 AI Scientists
Talent – Microsoft has 40 scientists whose publications have been cited in academic papers at least 5000 times, and 103 who have hit the 1,000 mark. While it remains well behind Alphabet in every measure of AI talent concentration and in every category of study, it is similarly well ahead of everyone else. Indeed, with 28 scientists with at least 10,000 citations, Microsoft and Alphabet (34) in a class of their own for the most recognized scientists, with #3 Facebook listing just 7 such scholars (Exhibit 22). Microsoft’s talent is slightly skewed toward computer vision, but has clear strength across the board. Microsoft has also been acquisitive – Touch Type, Metanautix, and Aorato were all bought for their AI capabilities – and has been a long-standing hirer of deep learning talent for its well-regarded research arm. We rank Microsoft a clear #2 in AI talent.
Data – 1.5B people use Windows every day, with 1.2B using Microsoft Office at least once a month. With the shift to a SaaS model, Microsoft is gaining significantly more data on the usage of those products. 400M users rely on Outlook.com for email – Microsoft has an archive of those messages and a record of each user’s activity. Xbox Live has more than 50M active members – Microsoft knows a LOT about their tastes in games and entertainment. There are about 75M Skype users making hundreds of millions of monthly video calls. Compared to Google and Facebook, these numbers may not seem impressive, but relative to anyone else, Microsoft has a lot of valuable data. We rank Microsoft #4 amongst US companies in its access to AI relevant data.
Exh 23: SSR AI Scorecard: Microsoft
Hyperscale Processing – Microsoft Azure is the second largest commercial cloud hosting platform, and, while it has a third the share of number one Amazon Web Services, it is growing at nearly twice the pace. Moreover, Microsoft’s internal data centers are also substantial and state of the art, supporting Office 365, Dynamics ERP, Bing, Outlook, Skype, Xbox Live and other services. All in, Microsoft has PP&E of $15.7B, most of which is data centers. It spent $5.9B on capex in 2015 and is expected to spend another $7.6B this year. Azure added an AI-focused GPU compute capability in 2015 based on Nvidia graphics processors. We would rank Microsoft’s processing capabilities as second best, behind Google and ahead of Amazon, Facebook and IBM.
Time – Without the benefit of experience, Microsoft would rate behind Facebook on this list. By virtue of its ambitions in search, consumer devices and gaming, it has done extensive work on natural language processing and in gesture based controls. This experience shows in products like the Cortana virtual assistant, the Kinect gesture and speech controller for Xbox, and the still experimental HoloLens augmented reality system. Perhaps former CEO Steve Ballmer’s stubborn insistence on those consumer oriented products will have a payoff after all. We place Microsoft as second, behind Alphabet in its relevant experience (Exhibit 23).
We should not be afraid of AI. Instead, we should hope for the amazing amount of good it will do in the world. It will save lives by diagnosing diseases and driving us around more safely. It will enable breakthroughs by helping us find new planets and understand Earth’s climate. It will help in areas we haven’t even thought of today.
Facebook CEO Mark Zuckerberg -2016
Facebook is number 3 with a bullet, moving aggressively to try to make up distance in AI between itself and archrival Alphabet. While it had dabbled in deep learning systems for years, it jumped in with both feet in 2013, hiring pioneering scientist Yann LeCun and a team of researchers from NYU to found Facebook AI Research (FAIR). LeCun is considered the world’s leading mind on image recognition, an outstanding fit for Facebook with its massive data base of user submitted photos and videos. Its investment in AI also extends to natural language processing, the core technology behind Facebook Messenger’s ambitious bot platform for commercial customer service applications.
Talent – It starts with Yann LeCun, who, with Alphabet’s Geoff Hinton, IBM’s Yoshua Bengio, and Baidu’s Andrew Ng, is considered one of the pre-eminent scientists working in AI. The FAIR team, which also includes important researchers like Vlad Vapnik, Jason Weston, Rob Fergus, and Marc’Aurelio Ranzato, is based out of New York. It has 16 scientists with at least 5,000 citations, of which 7 exceed 10,000 citations and accounts for about 11% of the citations in our data base. It has acquired AI start ups Wit.ai, Surreal Vision, and Pebbles to augment aggressive hiring from top computer science graduate programs. Overall, we believe Facebook may have as many as 500 engineers with AI credentials working across the company. We rate Facebook’s talent as third overall (Exhibits 24-25).
Data – Facebook has more than 1.5B monthly users, 1.2B of whom visit the site every day. They account for 20% of the time spent on mobile devices. Based on usage growth, we believe Facebook’s database has likely grown from roughly 300 Petabytes – that’s 300 million Gigabytes – in 2014, to well over an Exabyte – that’s a billion Gigabytes – today. In that massive archive are many billions of photos and videos, along with the history of friends, clicks and likes for every one of those users, tied to their real name and accurate demographic data. The wild success of WhatsApp and Messenger, each with nearly a billion users and collectively processing more text messages than all of the wireless carriers in the world, yields another huge source of relevant data. Facebook is the only company in the world with data resources worthy of even being compared to Alphabet, but still, it is a fairly distant #2.
Exh 24: Facebook’s Top 10 AI Scientists
Exh 25: SSR AI Scorecard: Facebook
Hyperscale Processing – Facebook has $5.7B in accumulated PP&E and spent $2.5B in 2015 capex, up 30% YoY, almost all of it for data center infrastructure. In 2015, FAIR contributed an Nvidia GPU-based hardware design for running deep learning applications to open source, noting that Facebook had increased its own investment in deploying blades on the design by more than 3 times. This commitment to supporting AI in hardware elevates the company’s position on the list from #5 to #4, leapfrogging IBM, but still well behind Microsoft and Amazon.
Time – Facebook was founded in 2004, hit 500M users in 2010, and came public in 2012. Its early days were spent managing the explosive growth of its fairly straightforward original business model, adding an advertising auction platform modeled after Google’s to monetize the platform. In 2009, it introduced the algorithmic feed, which prioritized messages based on the system wide popularity of posts and a user’s record of likes and clicks. This started Facebook on the road to AI, and the 2013 founding of FAIR. Relative to Alphabet, Microsoft and IBM, this is a late start, but Facebook has been all in ever since. Based on its scientists prior work, the company likely has a time advantage on some elements of image processing.
“In the future, every decision that mankind makes is going to be informed by a cognitive system like Watson, and our lives will be better for it.”
IBM CEO Ginny Rometty – 2015
Watson gets a lot of press. Coming off expert system Deep Blue’s 1997 triumph over world chess champion Garry Kasparov, IBM researchers cast about for a new challenge. Inspired by the publicity around Ken Jennings’ 2004 74 match winning streak on the TV game show Jeopardy, the company set about creating a machine that could beat him. Because Jeopardy required precise understanding of English language clues that often contained clever wordplay, Watson was built on a neural networking architecture, run on a self-contained 90 server cluster with 2,880 POWER7 CPU threads and 16 terabytes of RAM. In January 2011, Watson finally took the stage against Jennings and fellow Jeopardy champion Brad Rutter, winning both matches and a $1M prize, which was donated to charity.
Talent – IBM may have bled talent in most other parts of its research organization, but it remains a force in AI. It has 61 scientists with at least 1,000 citations, 16 with at least 5,000, and 6 with at least 10,000, led by top industry guru Yoshua Bengio, who was hired in 2015. In total, IBM scientists were responsible for 7% of the citations in our database, giving them a firm fourth place, behind Alphabet, Microsoft and Facebook. IBM is a bit weaker in more recent work, receiving 6.3% of the citations since 2011. The publishing record of IBM scientists skews sharply to natural language processing, unsurprising given the particular focus of the company’s Watson program. IBM’s directory lists more than 350 researchers working in AI (Exhibit 26-27).
Exh 26: IBM’s Top 10 AI Scientists
Exh 27: SSR AI Scorecard: IBM
Data – IBM has little data of its own, and is unlikely to exploit its deep learning technology to launch commercial AI services of its own for individual use. Rather, it will seek to apply its Watson deep learning platform to data held by customers in a variety of target markets, such as finance, health care and government agencies. Because it has longstanding business relationships with most of the biggest organizations in these markets, it is able, at least somewhat to sidestep its obvious deficiency in this arena.
Hyperscale processing – Watson was originally built to be a licensed software product running on a dedicated server cluster. Since its introduction, IBM has implemented the platform as a service on its cloud infrastructure, which is underscaled relative to the top commercial players and losing competitive ground – i.e. Amazon, Microsoft and Alphabet. Its PP&E sits at $10.7B and its 2015 capex at $3.6B, again, behind its three main rivals. Facebook, which has a smaller footprint and lower capex, may also be a step ahead of IBM in the sophistication and efficiency of its infrastructure, particularly with regard to running deep learning systems. Still, with IBM’s intriguing position with commercial customers, its infrastructure is likely not an impediment.
Timing – As previously noted, IBM has a long standing interest in artificial intelligence dating back to checkers playing programs in the 1970s. The 2004 decision to make Jeopardy the company’s next challenge was serendipitous, putting IBM in a strong position in natural language processing research. This remains a very important application area for deep learning, and one that will open many avenues to commercialization for the company. IBM also seems to be at the forefront of applying deep learning technology to clinical medical issues, citing work in triage and cancer treatment protocols as important successes. It has also worked with police departments and federal agencies in building crime analysis and prevention systems.
“Humanity is currently in a golden age of machine learning and artificial intelligence development. We really are at a tipping point where the progress is accelerating”
Amazon CEO Jeff Bezos – 2016
Amazon made a big splash in 2015 with its Echo speaker, featuring the now famous Alexa AI assistant. A flood of TV ads show Alexa bantering with Alec Baldwin, answering questions, checking the weather and booking transportation. Work on Alexa had begun three years earlier in Amazon AI research centers in Cambridge (MA) and Berlin, and focused on the practical aspects of voice recognition at distance. The result was a product that just works in the market more than a year before the first real alternative, the upcoming Google Home, is expected to hit the market. This sums up the pragmatism inherent in Amazon’s approach to deep learning systems.
Talent – Amazon does not have the honor rolls of credentialed academics of its rivals. It has just 26 employees with at least 1,000 citations in relevant academic journals, 9 with 5,000 and 3 with 10,000, leaving it well behind Alphabet, Microsoft, IBM and Facebook. This is, in part, because Amazon prohibited its full time employees from publishing any new papers in journals or presenting work at academic conferences until very recently. This has had a negative effect on its ability to attract and retain top talent, although this may improve with the company’s recent change of heart. Amazon recently acquired photo recognition startup Orbeus, after having bought speech recognition companies Evi and Ivona in 2013, and earlier this year, Jeff Bezos hosted a retreat for academic machine learning and robotics experts in Palm Springs. While this may point toward an increased profile for AI at Amazon, we rate its talent as well behind Alphabet, Microsoft, IBM and Facebook (Exhibit 28-29).
Exh 28: AMZN’s Top 10 AI Scientists
Data – Amazon does have juicy data on which to set its AI researchers. It has well over 300 million active customer accounts, including an estimated 90 million prime customers. Amazon knows exactly who these shoppers are, what they have bought and what they have looked at without buying. It also has extensive data on the purchasing, warehousing and shipping of the millions of items in its inventory, making it the most advanced repository of logistics data on earth. We rank Amazon #3, based on the unique relevancy of this data set.
Hyperscale Processing – Amazon Web Services is the dominant commercial cloud platform, with more than 30% market share of IaaS industry revenues. While we believe that nearly half of its $21.8B in PP&E relates to Amazon’s substantial investment in fulfillment centers, it’s total data center footprint for both internal use and the AWS business likely slots in close to #2 Microsoft, and ahead of IBM and Facebook. This ranking would also follow from Amazon’s $4.6B in 2015 capex. We note that AWS began offering Nvidia GPU processor instances to support learning applications in 2013.
Time – Amazon has been employing robots in its fulfillment centers for several years, buying is primary supplier Kiva Systems in 2012. Arguably, it has the most advanced implementation of AI driven commercial robots in the world. This expertise may also support Amazon’s ambitions for automated drone delivery systems, an area where it has been investing for several years. Alexa is also an excellent implementation of natural language processing technology, an area with far reaching potential use cases.
Exh 29: SSR AI Scorecard: AMZN
What About Apple?
When Apple bought SRI in 2010, Steve Jobs was quick to assert that he had bought it for the AI rather than for the search functionality. Since then, Jobs vision of Siri as a true intelligent assistant has foundered a bit. In a sense, Apple has been competing in AI with two hands tied behind its back. On the first hand, the company’s steadfast refusal to collect and exploit the usage data of its loyal and widely coveted customer base limits the ability to build learning systems and their value to users. On the second hand, Apple’s focus on local processing to the detriment of cloud-based hyperscale data centers limits the size of the ambitions that it might have in addressing user problems – an AI that needs to run on a smartphone will not be as capable as one that runs in the cloud. Both of these self-inflicted limitations, along with the culture of secrecy that prohibits company scientists from writing academic papers or presenting at conferences, much less maintaining part time appointments teaching and supervising grad students at university labs, hampers Apple’s ability to recruit top talent. It has acquired promising AI startups, like computer vision specialist Perceptio, natural language company VocalIQ and Emotient, which is working on a system to perceive people’s emotions from photographs, but thus far has little to show for it.
Siri has been a bit of a bust, and its founders have already moved on and are readying a competitive AI assistant called Viv. iPhotos has been bleeding market share to Google Photos, which already has 200M users in less than a year, unable to match its rival’s slick automatic categorization and powerful image search tools. The personalization tools in Apple Music and the App Store are crude next to Amazon or Google. Still, Apple is soldiering on – it has a skunk works operation working on self-driving cars and is rumored to view augmented reality with great interest. What does it have to work with?
Talent – Apple does not allow its scientists to submit scholarly papers, so their citation counts are limited to work published prior to employment, but even so, the numbers are very low. Only 6 have received 1,000 or more citations, with just one with a count over 5,000. That one is Gunnar Evermann, who it nabbed from Nuance, the company behind Siri’s voice recognition, in 2014. Interestingly, Apple has not been particularly active in poaching AI talent otherwise, generally building its research staff with newly hired Ph.D.s. It has also been recently active in the M&A market, buying natural language start up VocalIQ, and image recognition companies Perceptio and Emotient. Given the lack of senior leadership and uncertain scholarly credentials of the Apple team, it rates far behind our #5 Amazon, likely trailing the troika of Chinese internet behemoths – Baidu, Tencent and Alibaba – and only slightly ahead of typical AI startups.
Data – Apple could have been a contender. It has an installed base of a billion devices, about half of which are iPhones. Apple maintains an iron grip on this installed base, strictly controlling the apps available to users. Its users are famously loyal, often employing multiple Apple devices together – Macs, iPads, AppleTV and Apple Watch in addition to the iPhone. It controls messaging for all of its iPhone users, a majority of them use Apple Maps, and all of them have a payment method on file in the app store. This should be extremely valuable … but it’s not, because Apple refuses to capture the data about all of this and use it to personalize its services to its users. Perhaps Apple will change its policy sometime in the future, but in the meantime, it is losing the chance to collect this information now.
Hyperscale Processing – Apple is very late to the cloud game. Despite considerable recent capital investment directed toward building out its data center capabilities, Apple’s cloud performance remains far behind its rivals Alphabet, Amazon, Microsoft and Facebook. Indeed, Apple recently made the move to shift some of its iCloud services off of its own infrastructure and onto the Google Cloud Platform, a clear indictment of its own data center prowess. Until the scientists and engineers that are responsible for Apple’s own cloud services and infrastructure are accorded the same lofty status as its hardware designers, it seems unlikely that it will attract the sort of talent necessary to begin to fix these deficits.
Time – Apple acquired Siri in 2012, a prescient move that allowed it to deliver the voice controlled assistant with the iPhone 5 the next year. Still, the voice recognition behind Siri was provided by a 3rd party, Nuance, and while Apple has poached several engineers, the relationship remains. Apple made a splash in the news a year ago, as leaks that the company had begun work on a self-driving electric car hit the news.
Outside of the big 5 US AI leaders, China is the only other real center of deep learning excellence, spurred, in part, by an aggressive effort by Chinese Universities to steer promising students into the field. While details on Chinese company’s positioning are difficult to specify, we believe that Baidu, Tencent and Alibaba would likely challenge for spots in our list of top AI companies. In particular, Baidu, which poached Stanford deep learning guru Andrew Ng from Google to head its Silicon Valley AI labs, appears to have a strong play in the space. All three have access to the best and brightest from Chinese technical universities, deep archives of data on their Chinese customer bases, and world-class hyperscale computing facilities. We are less encouraged by the belated moves by other Asian consumer electronics giants – e.g. Samsung and Sony – to jump into the AI fray.
There are also a large number of startups in the space – Crunch Base lists 649 venture funded companies (Exhibit 30). Many are pragmatic, using open source platforms like Alphabet’s TensorFlow to build vertical apps that do things like qualify sales leads, monitor operations, support medical diagnosis, target ads, or mine financial data. Others offer specialized AI capabilities – response monitoring, more natural and colloquial language processing, image recognition, etc. – that could be embedded across a broad set of applications. A few are more ambitious, think tanks and learning platforms that look to extend the state of the art. It is almost certain that many of these companies will be acquired, as the leaders try to corner the market for the bleeding edge, and as also-rans try to make up ground.
Exh 30: List of AI Startups