AI Assistants: The New UI Paradigm that will End the OS/App Era
SEE LAST PAGE OF THIS REPORT Paul Sagawa / Tejas Raut Dessai
FOR IMPORTANT DISCLOSURES 203.901.1633 /.901.1634
psagawa@ / firstname.lastname@example.org
June 12, 2018
AI Assistants: The New UI Paradigm that will End the OS/App Era
AI assistants, based in the cloud, will transcend the OS/device/App linked UI paradigm to give users consistent, flexible, and maximally convenient access to digital services and content across the disparate venues of their daily lives. While early iterations of the assistant paradigm have leaned on voice recognition as a differentiated input mechanism, spoken commands will be just one modality for assistants that will take direction from clicks, text, gestures, eye movement, and most importantly context, and anticipate user needs as well as respond to requests. Already, AI assistants have established a beachhead in more than 50M AI-powered speakers and hundreds of millions of assistant-capable smartphones, answering questions and executing simple tasks. Increasingly, assistants recommend actions, and in time, we believe users will trust assistants to manage mundane aspects of their lives. We believe the AI assistant paradigm will bridge across disparate devices and venues, commoditizing devices and obviating service focused apps, transferring value to the assistant platform. We see few companies positioned to compete for this comprehensive UI role, with GOOGL and AMZN the clear leaders. AAPL, already behind in advancing Siri, may be critically limited by its device proprietary strategy. MSFT seems to be retrenching to position Cortana as an enterprise specific platform. China’s Tencent has a strong position in Asia. FB has not made this a priority.
- AI Assistants are more than voice recognition. Voice commands are a powerful lead use case for AI assistants, but it is just one possible modality for the technology. Depending on the device, input could be taken from clicks, text entry, physical gestures, pictures, facial expressions, eye movement (important for AR glasses), as well as sound. Assistants will also take cues from context – location, time of day, environmental conditions, proximity to others, messages, schedules, news items, etc. –anticipating user needs based on that context and history. Simple commands will impute more meaning, generating better responses more easily, enabling a significant step forward in functionality and convenience.
- The beachhead has been established. There are more than 50M AMZN Echo’s and GOOGL Home smart speakers in use. Nearly 400M Android phones support Assistant, and GOOGL handles more than a billion voice searches per day. AAPL claims 375M monthly users for its Siri AI assistant, but 3rd party reports suggest that engagement has dropped sharply over the past year, perhaps with the spread of AMZN and GOOGL devices. MSFT reports 175M monthly users for its Cortana assistant, through the ~50M users of its Xbox One gaming console and via Windows 10 PCs. While all the top AI assistants but Siri are available on 3rd party platforms, default status on a device is a powerful advantage.
- Answering questions and executing commands. According to a study by Stone Temple, the top two uses for smart speakers are general information (cited by 60% of users) and weather (57%), with streaming music (54%) just behind. This is in line with the way that the services have been marketed, featuring assistants answering questions and acting as a voice-activated remote control. With time, the accuracy for recognizing questions and answering them appropriately has improved markedly for all the assistants, although GOOGL still has a substantial lead. Control applications are branching from just music played on the speaker itself to interfaces for nearly anything that can be connected via internet or Bluetooth. Assistants platforms are beginning to string actions together to enable more complex requests, that could apply context awareness – for example, leaving the office could cue a car service request and set the home environment for an imminent arrival.
- Customized recommendations and autonomous action. With use, AI assistants should gain insight to a user’s specific preferences and habits in various contexts. This understanding will allow them to anticipate requests and to make increasingly accurate recommendations. For example, upon receiving an itinerary for an upcoming business trip, the assistant could automatically suggest flights, hotels, ground transportation, and local restaurants to go with it, making the bookings upon approval. The AI assistant could be given automated control of various routine tasks – for example, monthly bill payments, family calendar management, or home monitoring (i.e. security, utility use, maintenance checks, etc.). The AI can have discretion in fulfilling requests – finding the best ride option rather than just booking an Uber or comparison shopping across vendors.
- AI Assistants will transcend OS and disintermediate apps. AI assistants will sit in front of the device OS, as the primary interface for interacting with digital information and services. One gesture might bring up a camera viewfinder, settings established by context. Many apps will be hidden to users, accessed directly by the assistant – why open up GrubHub to order a food delivery when you can ask Alexa for a pizza? Importantly, the same assistant will be available across disparate devices – home speakers, smartphones, PCs, cars, TVs, appliances, etc. will all understand the user in the same way, have knowledge of their full history and respond to the same inputs. This will beget great power and convenience for users, while opening massive opportunities in advertising, e-commerce and services for the platforms. High-speed, low latency 5G networks will be a key enabler for cloud-based assistants.
- GOOGL and AMZN are obvious winners. This vision of the AI assistant as the primary user interface is intrinsic to both GOOGL and AMZN’s strategies. We give an edge to GOOGL for its reach (i.e. 2B Android smartphones, 7 franchises w 1B+ users), its data assets (e.g. location, calendar, contacts, interests, etc.), and its leadership in AI technology. Still, AMZN is a formidable competitor working from a head start with Alexa and having established an impressive base of partners and preset tasks. Both should be able to establish strong, defensible markets for their assistants. We also note that Tencent is exceedingly well positioned to lead this opportunity in China.
- AAPL and MSFT have hurdles to overcome. While AAPL has finally opened Siri to 3rd party developers and moved to close its functionality gap with AMZN and GOOGL, it is steadfastly device centered, limited by available data, and restricted to its own ecosystem. We believe this is will hamstring their efforts to improve the service and leave them vulnerable to disintermediation by GOOGL and AMZN’s broader vision. MSFT will look to leverage the base of Windows 10 PCs and Xbox in the home but remains a secondary presence with consumers vs. GOOGL and AMZN. Look for it to push the technically strong Cortana for enterprise applications in the hope that success there could bleed into the consumer market. FB, the other major consumer internet platform, has not offered a strategy to compete for this opportunity.
You Talkin’ to Me?!
The iPhone upended the browser-dominated PC-era internet paradigm, introducing Apps, specialized application programs that were a shortcut to customized content on the web. By now, we are all familiar, and many companies – AMZN, FB, PCLN, et al. – have ridden the paradigm to create enormous value. However, a decade later, AI assistants appear poised to upend the App paradigm.
Public perception of AI assistants is fixated on the voice recognition element and the its manifestation on the AMZN Echo and GOOGL Home speaker products, but the potential is much, much more than that. Served from the cloud, an AI assistant will one day follow a user as they move about their day offering a consistent interface customized to their needs and available from most devices with access to the web regardless of OS. Opening an app will be an unnecessary inconvenience. Voice will be one of many modalities – text and clicks, photos and video, gestures and facial expressions, or even eye movements behind AR glasses will all be inputs and output will be similarly varied to suit the environment and the nature of the request. Importantly, AI assistants will learn about the preferences and habits of the user, shaping recommendations to insights gained. They will also be able to consider context – e.g. Where is the user? What time is it? Where does the user need to go next? Who is with the user? What are the environmental conditions? What is happening in the world? – incorporating that understanding into the interaction with the user.
Thus far, the use cases have been straightforward. Voice search is the most common task, followed by requests for the weather report. Simple remote-control apps are popular – streaming music control is the number three task, with smart home tasks – set the thermostat, change the lights, etc. – increasingly popular. Voice shopping is another up-and-comer, with 40% of millennials copping to monthly online voice orders. With time, the tasks will grow more complex – combining multiple steps and requiring data from multiple apps – and more customized – adjusting smart home settings in anticipation of an imminent arrival. Increasingly, AI assistants will make recommendations in tune with a user’s demonstrated preferences and anticipate their contextual needs. Done well, this will create enormous convenience for people, saving time and trouble, and often, providing useful information and services that might have slipped their mind. Eventually, users will come to rely on their assistants to automate mundane life tasks – e.g. paying monthly bills, managing a family calendar, booking travel services, or monitoring their homes.
With more complex use cases, consistency across devices and venues will be vital. A task requested a PC should track to a phone, a watch, a pair of AR glasses, a car infotainment system, a smart speaker, or a TV. Each of these should have access to the same system informed by the user’s activity in each venue. This means that the AI assistant will be resident in the cloud. An alternative approach – basing the AI assistant on the device – fragments the experience or requires that the master device be in control of all venues. This is limiting – processing, storage, and data access are MUCH better and cheaper in the cloud. Moreover, the device centered approach demands much closer cooperation between device makers.
GOOGL and AMZN are both working toward this vision. While our handicapping prefers GOOGL for its superior reach, data and AI capabilities, AMZN is obviously a competitive force. In China, Tencent has the inside track on the opportunity. AAPL, which is committed to a device-centered philosophy, is already behind with Siri and may be headed in the wrong direction for future development. MSFT has AI assistant bona fides with Cortana but may have to specialize on applying it to enterprise use cases.
Exh 1: Smart Assistants Development Timeline
Computer … Set a Course for Delta Vega Sector!
In the nerd culture of Silicon Valley, the reverence for Star Trek runs deep, and while tricorders, warp drives and transporter rooms may be theoretically questionable aspirations, the computer that served the command deck of the Starship Enterprise has been a driving vision for many of the industry’s leading lights. Bill Gates, Jeff Bezos, Larry Page and Sergei Brin have all cited the seemingly omniscient voice answering Captain Kirk’s questions as inspiration for their careers. A computer that could understand and answer spoken requests in common vernacular and execute complex tasks on cue – this was mission.
Apple’s Siri, introduced with the iPhone 4S in 2011, was the first big commercial move to deliver a commercial voice interface. While well trumpeted and an obvious major step forward, the reality of Siri fell quite a bit short of the Star Trek standard. Siri struggled to consistently recognize commands. It required users to follow exacting syntax. It was limited to a modest set of tasks and had serious limitations to the questions that it could answer. After the initial excitement, actual uptake by users was ho-hum, and while Apple improved Siri with time, the company moved on to prioritize other initiatives for the iOS platform.
In November 2014, roughly three years after the birth of Siri, Amazon announced Alexa as the key to its new Echo home speaker product. Alexa came to market with support for popular 3rd party music services Spotify and Pandora, along with a list of skills – weather reports, news digests, smart home controls, etc. – that hit a sweet spot for many consumers (Exhibit 1). Without its own smartphone operating platform, Amazon had been looking to sidestep smartphone platform leaders Apple and Google. After its own smartphone was a flop, and after the uptake of its Kindle Fire tablet began to slow, Amazon found another way into its customers’ homes (Exhibit 2).
The Echo was a hit, selling 5 million units in its first year on the market and about 20 million in its second. Around two years in to the life of the Echo, the Google Home hit the market. The Home closely mimicked
Exh 2: Smart Speaker Installed Base in US, March 2018
Exh 3: Top 4 Digital Assistant Offerings and Focus
the Echo in its basic concept – a self-contained speaker connected to the internet with sensitive microphones able to grab spoken commands and forward them to the AI living in the cloud. That AI was unimaginatively called Google Assistant, and it built on capabilities honed from providing voice-search on Android smartphones. It also added a wrinkle – the previous year, Google had introduced Google Now, an extension to Search that would automatically pull up media deemed likely to be of interest to the user based on their history. This capability would recommend content on the Home. Moreover, many of Google’s customers also used its email, calendar and other productivity applications – Assistant would help them access this information as well.
Since the Home hit the market, Amazon and Google have battled – adding new functionality, and signing on new partners with products to control and products that could also play host to the AI assistant, while constantly improving the performance of their platforms (Exhibit 3). Just this past holiday season, Apple joined the party with its Homepod, an expensive option which emphasized its role as a speaker more than its ability to fulfill requests. Reviews of the product have not been kind, but Apple seems to have recommitted to advancing Siri, finally opening the platform to 3rd party developers while devoting a portion of its recent WWDC keynote to demoing several new capabilities.
There’s More to it than Voice
Good voice command systems use deep learning AI models to decode sounds into words (voice recognition) and then interpret the meaning of the words to find the command that they contain (natural
Exh 4: Possible Input Modalities for Digital Assistants
language processing). This is hard, and the leading AI assistants lean on their voice systems as important points of differentiation. While early iterations struggled with regional accents, required commands to follow strict syntax, and even then, often failed to properly interpret requests, the latest versions are much, much better. AI assistants can handle multiple users, can pick out commands in a noisy room, and can interpret requests from colloquial spoken language. But wait … there’s more!
The secret sauce for AI assistants is an ability to figure out your needs from confusing and complex inputs, and while those inputs are often spoken words, the don’t have to be. Text and clicks may be old fashioned, but Google built an $80B a year business based on responding intuitively to typed queries. AI assistants will need to be ready for that. Cameras, taking both still and video images, are becoming major elements of search – “What’s this and where can I buy it?” – AI assistants will have the answer. Mobile device makers have long worked to capture a wider range of inputs – e.g. gestures, facial expressions, motion, biometric sensors, and other indicators – and use them to trigger various functions in lieu of manual entry (Exhibit 4). Assistants could do that too. Someday, (we think a bit longer out than does the Apple blogosphere) Augmented Reality glasses will become a thing, and eye movement trackers may take the place of swipes and clicks – cloud-based AI assistants will be ideal for interpreting the user’s intent.
Exh 5: Key Factors Guiding Completeness of Digital Assistant Systems
And that’s not all! The AI assistant will be aware of your context. Where are you? At home? At the office? In a car? At the airport? In a theater? At a restaurant? Walking down the street? Location will make a difference in interpreting a user’s intent (Exhibit 5). Today, that means “dim the lights” dims the lights in the room where you are sitting, but in the future, it might mean automatically pulling up your boarding pass as you approach the airport gate. What time is it and what is on your calendar? Who is with you? What is happening in the world? The answer to any or all these questions may have a dramatic bearing on your needs and the AI assistant will interpret your requests in that light. We think this will be a big deal.
Act now! The AI assistant need not wait for you to ask for something. Knowing your context and your historical preferences, it may reasonably anticipate certain requests. Google is already doing this, delivering a newsfeed of media stories tuned to your revealed interests on the far-left screen of its most recent Android version. An email proposing a slate of meetings across several cities might spur your assistant to propose flight, ground transportation, hotel and restaurant options for your review. A simple yes would set it to making all the appropriate reservations. Getting up from your desk at 6:30PM could prompt the assistant to call you an Uber, then automatically adjust your smart home settings when you are 5 minutes from arrival. Eventually, you might trust your AI assistant to take over mundane aspects of your monthly routine – paying bills, managing your family’s calendar, or monitoring your home (e.g. security, energy use, etc.).
Operators are standing by! A cloud-based AI assistant will be available to you from nearly any device that is connected to the internet. The progress of a task initiated in one place will follow to another device and to another and another. Each device will have the benefit of learning across all your devices and all will have access to all your data.
Even though you may have your smartphone with you, this is saves the time and effort to synch two systems or to drop continuity to check your phone. It also means that the assistant can work across disparate operating systems, if the appropriate app exists, customizing the experience consistent with your preferences expressed elsewhere. This is a powerful user interface paradigm. 5G wireless networks will be an important enabler of this – providing low-latency, high bandwidth connectivity as needed (Exhibit 6).
Exh 6: Elements of Success for Mass Deployed Digital Assistant Systems