Augmented Reality: Smartphone AR Won’t Be the Next Big Thing
SEE LAST PAGE OF THIS REPORT Paul Sagawa / Tejas Raut Dessai
FOR IMPORTANT DISCLOSURES 203.901.1633 /.553.9827
psagawa@ / firstname.lastname@example.org
September 24, 2017
AAPL touts its ARKit developer API, which allows 3rd parties to build rudimentary augmented reality apps for iOS, as a major advance. Developers are enthusiastic, and analysts and pundits expect AR to be a key driver of revenues for AAPL. We are skeptical for several reasons: 1. ARKit (and GOOGL’s very similar ARCore) will offer very limited AR functions – coordinating video input with location and motion sensors to established fixed perspective, and identifying flat planes against which images might be displayed. 2. Months after the announcement (and more than a year after GOOGL’s Tango AR HW solution) no killer apps have been identified. 3. Using AR smartphone apps will be cumbersome and battery consuming. 4. Previous recent advances intended to drive iPhone share gain through app advantages (e.g. 3D Touch, Apple Pay) have had very limited success. 5. AAPL’s time-to-market advantage vs. Android is short, limiting the potential for ARKit to drive device switching. As such, we see smartphone-AR as a modest niche market, with more capable, bulky and expensive glasses-based solutions focused on high value vertical enterprise applications. We believe consumer AR glasses, which will require major progress in optics and may rely on low-latency 5G connections to the cloud, are at least a decade from wide adoption.
- Smartphone AR functionality is limited. AAPL’s ARKit (and GOOGL’s similar ARCore) rely on Visual Inertial Odometry (VIO) – a combination of image processing and inertial sensors – to establish the location and orientation of the phone in a roughly room sized environment. It uses 2D plane detection to establish flat surfaces on to which digital objects can be projected. As the phone moves about the space, the digital objects, which may be animated, can be rendered to reflect the changing perspective. However, the system cannot interpret context, so the digital objects cannot reflect an understanding of the environment or interact with it. Synching multiple devices to view the same AR content from their own perspectives is not yet possible.
- Smartphone AR will be cumbersome. ARKit and ARCore application users will view the world via their small smartphone screen. The field of view will be limited. Users will have to hold their smartphones at eyelevel while using the applications. Speed and range of movement might have to be limited to facilitate image rendering. Both APIs generate a room-sized internal 3D map into which digital objects may be projected. Movement outside of that space will require a new map, disrupting wider area use cases. Use of the video camera and processor will likely drain batteries.
- Killer apps are not apparent. GOOGL introduced its project Tango in June 2014, with hardware and software specifications to facilitate AR development. Tango, which included precise HW distance reckoning, was not widely adopted by OEM partners and has inspired modest application development interest. Still, we note that the proposed applications were unimpressive. ARKit and ARCore, far less ambitious than Tango, have not unearthed proposed use cases that are any more exciting. Gaming experiences are likely to be badly compromised by lag and the low resolution of images. Retail apps, particularly furniture shopping, have been popular demos, but seem at best a small niche – how many consumers are likely to feel the need for an Ikea app? The most intriguing use cases all demand contextual understanding and or mapping integration that will not supported by the new AR APIs.
- Proprietary app functions haven’t delivered upside. In the early days, AAPL derived substantial competitive advantage from the many apps available on iOS and not on Android. This has not been the case of late. Users no longer download many new apps, and the apps that capture almost all user engagement are all available on both platforms. Recent moves to differentiate on app functionality, e.g. 3D touch, Apple Pay, etc. – have seen very modest user uptake with no sign that they drove share gains or meaningful service revenues. AAPL’s window of advantage with ARKit may have little payoff.
- AAPL has the advantage … for now. ARKit and ARCore are very similar in their functionality and underlying tech. Still, AAPL, which, unlike GOOGL, controls a homogenous hardware architecture, giving it advantages at the start. AR apps will need to be precisely calibrated to performance of the sensors in the phone, which are uniform in the iPhone installed base and heterogenous in Android’s. As such, early applications will likely appear for iOS users well before they are available for Android. GOOGL has begun to exert stronger guidance for its OEM partners in other system parameters, and we would expect it to quickly marshal ecosystem support for more standard sensor configurations.
- Competition for AR as a future platform will be fierce. Beyond ARCore and project Tango, GOOGL is the world leader in machine vision AI, with 100 scientists with >1K citations. Its work on 3D maps for autonomous cars could also offer relevant insights. MSFT’s HoloLens is widely viewed as the most advanced glasses-based AR product. It is supported by a strong AI team with 45 machine vision/AR scientists. Facebook acquired virtual reality pioneer Oculus 2 years ago, taking its roster of scientists with relevant background to more than 40. AMZN is working on its own smart glasses with Alexa voice assistant integration. No AR at the start, but the company has 18 1K cited AI scientists with vision/AR background. AAPL has 10 cited scientists in the field, acknowledging that the company’s longstanding publishing prohibition has likely muted the recognition of its employees’ work. All of these companies see AR as a long-term priority.
- AR glasses will be an industrial product for years. The optics needed for glasses-based AR are expensive and bulky, appropriate for very high value enterprise applications – high tech repair, surgery, etc. – but many years from the form factors, price points and social acceptance necessary for consumer adoption. Both MSFT and GOOGL have focused their glasses AR efforts on enterprise opportunities. With no evidence that AAPL is investing on proprietary optics research, we are skeptical of reports of glasses as a new iOS form factor in the foreseeable future.
- Future consumer AR will be driven from cloud platforms. We believe mass-market consumer AR will need to be glasses based and fully context aware, able to insert sophisticated digital content that interacts with the specific location in view. The 3D mapping, image recognition and complex rendering will favor cloud-based solutions and high-speed, low latency wireless networks (5G). We believe this infrastructure will be available within the 10-year time frame needed to bring AR optics to price points and form factors appropriate for consumer applications.
AAPL’s June announcement of its ARKit API for smartphone augmented reality (AR) applications sparked a firestorm of AR enthusiasm amongst analysts, pundits and optimistic developers. GOOGL followed two months after with its own ARCore, a software-only version of its “Tango” reference design for AR Android smartphones. The two specifications are markedly similar, although ARKit is viewed as an easier path for developers, due to the homogeneity of the huge base of iPhones. We are skeptical that this will yield any meaningful benefit for AAPL, and believe that the consumer market for AR is likely still many years out.
First, the functionality provided by ARKit (and the very similar ARCore) is modest. A technique called VIO combines camera and inertial sensor inputs to establish location relative to identifiable points in the field of view. Thus, within a finite space (about the size of a large room), an AR app can fix a digital image viewable in 3 dimensions from any perspective. Another technique allows the app to find flat surfaces within the field of view so that the image might appear to be standing on a floor or table. The API does not help interpret the rest of the image, so the digital objects do not meaningfully interact with anything in view, nor does the system understand context. The map must be regenerated as the user moves beyond the finite space, potentially disrupting the orientation of digital objects, and the API is not explicitly link with GPS or digital maps. While many demos suggest the potential for multiplayer games viewing the same content in the same context, this is not initially supported and the technical hurdles for doing so are considerable.
Use of the initial apps will be cumbersome – users will hold their smartphones at eyelevel, movement will be limited to facilitate rendering, graphics response times will be slow, and the apps will likely drain batteries quickly. Meanwhile, after 3 years of GOOGL’s Tango, and 3 months of ARKit, the proposed applications are underwhelming, relying more on “isn’t that cool?” than on “isn’t that useful?”. AR furniture shopping will never be more than a tiny niche market. The most interesting proposed use cases – pedestrian navigation or multiplayer gaming for instance – require capabilities that are not even in the initial APIs. Maybe someone will come up with a true “killer app” that will drive wide and enthusiastic engagement with smartphone AR, but we haven’t seen anything close.
AAPL has a real time to market advantage vs. GOOGL. AR apps must be precisely calibrated to the specific sensor hardware in a smartphone – easy for AAPL with proprietary control of its devices, but much harder for GOOGL, which will focus on supporting its own Pixels, recent vintage Samsung flagships, and a few others. With time, the gap will close, and GOOGL has its own advantages (mapping, image recognition, etc.) that could become important. Furthermore, previous attempts to exploit proprietary app functionality (3D touch, Apple Pay, etc.) have not yielded noticeable benefit to AAPL.
The AR field is already crowded, and GOOGL, MSFT, FB and AMZN all have longer rosters of industry recognized machine vision and AR experts than AAPL. While we are cautiously optimistic for industrial AR, featuring expensive and bulky headgear to address high-value vertical applications, we believe the tipping point for adoption of mass-market consumer AR is many years away. By then, sleek, affordable AR glasses will use 5G to tap the cloud to interpolate context relevant digital content directly into a user’s field of view, potentially threatening the smartphone as the primary consumer access device. It is way too early to call a winner. It is also way too early to expect consumer AR to really take off.
Apple’s spring WWDC developers’ confab is the setting for the company’s annual OS software releases, which are usually accompanied by a splashy new product or two. This year, the most intriguing announcement was an “Application Program Interface” or API called ARKit, designed to give developers a turnkey template for writing augmented reality applications that would run on most iOS devices. The success of Pokémon Go, which featured an extremely rudimentary AR experience (which most players subsequently turned off to conserve battery power), had already stoked developers’ collective imagination around smartphone based AR, and enthusiasm ran high.
Many proposed use cases for ARKit harkened back to Google’s 2014 Project Tango initiative, which outlined a reference design for AR specific hardware – e.g. optical rangefinders, etc. – along with a similar software API. While few OEMs have offered Tango compliant gear, more than a few developers had kicked the tires in developing prototype apps for the standard. Soon after Apple’s ARKit extravaganza, Google revealed that it had retooled Tango as a software-only API called ARCore.
The two APIs have very similar functionality (Exhibit 1). First, the software takes input from both the video camera and inertial sensors to firmly establish the phone’s physical location relative to specific points in the field of view – a technique called Visual Inertial Odometry (VIO) (Exhibit 2). As the camera moves, the software builds a rough 3D map of the immediate surroundings – an area roughly the size of a large room – and digital objects can be pinned and oriented to specific locations. As the user moves around the map, those digital objects can be observed through the display from various perspectives. Second, the API functionality can also identify flat surfaces – floors, tables, walls, etc. – that might serve as anchors for digital content. The objects can then be observed as standing on the table or stuck to the wall, as opposed to floating in free space. Finally, the API informs the application of lighting conditions in the environment, enabling digital shadows that give objects depth and more lifelike appearance.
Yes, But …
The ARKit and ARCore approaches are expedient in that they work without requiring hardware beyond what is available in existing premium smartphones, but there are significant compromises as well. Perhaps the biggest drawback is the inability to understand the context of the field of view. The APIs do not identify specific objects or locations and the images generated by the apps cannot interact with the real world or reflect their surroundings. This is theoretically possible with integration to powerful cloud hosted image recognition AI APIs, but neither AAPL or GOOGL has offered this to developers. Another limitation is that the anchor points for the digital images are unique to each user, making it impossible for multiple devices to view the same object from different perspectives. Unless this is resolved, multiplayer games will be severely handicapped.
ARKit and ARCore also restrict the range in which fixed digital objects can be viewed and their locations retained. Moving out of range loses the reference points for the 3D map and a new map must be created. In this context, re-establishing the map and re-rendering content could unacceptable lags for users, perhaps necessitating limits on range or speed. This will be a challenge for applications intended for physical settings
beyond the size of a large room. Within the map, the field of view at any given time is restricted by the camera and display, creating a “keyhole” effect that will also constrain the user experience.
We are also concerned that using AR applications on a smartphone could prove cumbersome. Aligning the camera to the digital content will likely require devices to be held at eyelevel, a step beyond the more discreet and comfortable head down browsing currently typical for public content consumption. While Apple has promised considerable improvement, early AR smartphone apps have also been major battery busters. With the video camera and display on, and heavy computation taxing the CPU, it would seem unlikely that the power draw would not be significantly higher than more typical use.
What is it Good For?
Analysts and pundits have been positing use cases for ARKit from the moment it was announced, adding to the armchair imagineering conducted on behalf of Google’s Project Tango. While many of the proposed apps sound impressive, few of them will be possible with the first iteration of either ARKit or ARCore and many of them may never be practical given the inherent limitations of smartphone-based AR. Moreover, in many cases, the proposed role for AR is largely cosmetic – e.g. watching entertainment as an AR object on your tabletop vs. watching the same content as simple video animation. We believe that users will tire of these applications once the initial novelty wears off.
The most commonly cited application area is games. Unfortunately, the processing demands for rendering 3D content into a moving video stream will badly hamper response times relative to non-AR games, typically a major no-no when it comes to developing games. Moreover, the lack of true multiplayer support will also be a drawback. Finally, battery drain will be an issue. AR will have to contribute unique gameplay value for gamers to accept these shortfalls. The AR aspect of Pokémon Go drew considerable interest at launch, but with time most players have long ago turned off the capability, which adds nothing to the game beyond novelty and proved to be a battery killer (Exhibit 3).
Another focus for AR is retail. Ikea seems to be the poster child for ARKit, with its furniture shopping app front and center in every discussion of potential use cases. Reality check – furniture shopping is a very low frequency activity for consumers (Exhibit 4). In fact, consumer downloads of apps of any kind have stalled – most do not download even a single new app in any given month. The most downloaded retail app on the App Store is Amazon, all the way down at #22. We are skeptical that the value potentially added by smartphone AR to most retail categories is significant – for most categories, seeing what the object might look like in a specific spot in the home is hardly necessary. One clever use case for ARKit is SmartPicture’s virtual tape measure, which allows a user to accurately measure the dimensions of a room, creating a digital floor plan in the process that could be used to plan renovations or evaluate furnishings. This is interesting and impossible without AR, but then again, the target market would seem quite small.
Education apps are another common proposal. It is not clear that enabling students to use their phones to project objects into a classroom via AR offers any unique value. The same materials would seem to work as well set in a generic setting as an interactive animation. Navigation is another headscratcher. Any value generated by projecting directions directly into a live image of a street is negated by the need to view the world through the keyhole of the phone screen held at eyelevel. Furthermore, GPS, in its current state, offers only accuracy within 15 feet (N.B. new chips with 1-foot resolution may be available to phone makers in 2018) – a real potential application killer for current users. Someday, smart glasses will project all kinds of useful information into our view as we walk about, but ARKit city maps seem a solution in search of a problem.
Some have proposed enterprise use cases for smartphone AR. With some – employee training modules or 3D collaboration tools – the AR is clearly a novelty bolt-on. For many others – 3D product design tools, repair technician support, emergency/military simulations, remote monitoring, etc. – smartphone AR is inadequate to the job. We believe head mounted displays (HMD), such as smart glasses, with more sophisticated AR platforms will prove much better suited to these applications. This is where Microsoft has aimed its HoloLens AR initiative, and where Google has re-engaged its Glass project.
Finally, Snapchat, with the success of its AR filters for its messaging app is often raised as the ultimate proof of concept. We see a few issues here. First, there are signs of filter fatigue – pasting rainbow vomit or bunny ears onto the people in photos and videos has lost much of its novelty. Second, Snapchat’s filters depend on image identification – finding faces and pasting the digital content specifically on facial features – this is not a part of ARKit. Finally, Snapchat built its app capability WITHOUT ARKit, and it’s not clear how it might be improved using the new API.
With the global community of iOS developers engaged with ARKit, new applications are popping up daily. Perhaps a real killer app, one that makes iPhone users open their wallets and gets Android users to switch, will emerge. So far, we are very skeptical.
The Last Next Big Things
Over the years, Apple has released numerous APIs to developers offering access to unique functions available on iOS devices – Siri, HealthKit, HomeKit, Apple Pay and 3D Touch come to mind. In most cases, Google has followed with similar capabilities in Android, holding Apple’s advantage to a short window (Exhibit 5). In other cases – i.e. 3D Touch – user reception to the functionality was tepid at best. In no cases does it seem that these APIs had any meaningful impact on brand switching or upgrade cycles. The 2015 iPhone 6 super-cycle was unambiguously driven by the introduction of large screen form factors, with strong share gains in 2013 driven by China Mobile’s iPhone launch.
In the early days of the iPhone, the availability of apps was a significant differentiator. Cool new apps would appear in the App Store many months before they were available on Android, IF they were made available on Android. However, this gap has largely closed in an environment where users have grown blaze about apps. For the average user, their 6 favorite apps account for more than 90% of engagement. Of the top ten apps by usage, six are Google apps and three are Facebook apps. More than 90% of new apps are used just once or not at all after download. The top grossing apps are overwhelmingly available on both major platforms.
Apple is hoping that ARKit changes this scenario, driving monetizable app downloads, more frequent upgrades and platform switchers from Android. If previous API introductions are any guide, they will be disappointed.
ARKit vs. ARCore
Apple and Google’s augmented reality APIs are very similar, but from a developer’s perspective there are important differences. ARKit’s most important asset is the tight homogeneity of iOS devices. VIO, the technology at the heart of both APIs, must calibrate image data with data from the inertial sensors in the phone. These sensors estimate location by measuring directional acceleration relative to gravity. However, these estimates are imprecise, with errors that compound with time as motion is tracked. A VIO system adjusts for these errors to keep digital objects fixed in the locations established by visual anchors in the image stream. Because Apple has implemented the same sensor suite in all recent iOS devices, adjusting for these errors is simplified, and by all accounts, ARKit provides exceptionally stable location reckoning. In contrast, Google’s many licensees are not at all coordinated in their sensor configurations. Each different model must be separately calibrated, dramatically slowing a potential roll-out. As such, Google has focused on the flagship devices from the most popular Android brand, Samsung, and its own Pixel line for initial ARCore support. From launch, ARKit will be supported by any iPhone upgraded to iOS 11 – a population expected reach 325M by year end, with a potential base of more than 500M devices. Despite a global base of well over 2 billion Android smartphones, Google will find it very difficult to meet its goal of 100M supported devices at launch, given the lack of standard sensor configurations across OEMs. For a developer, this is a significant difference (Exhibit 6).
On a technical basis, ARCore does have some advantages over ARKit, largely owing to Google’s three years of experience with Project Tango (Exhibit 7,8). Google’s 3D mapping is considered more sophisticated, allowing for a larger working area, smoother transitions as the user moves out of a map area and a facility for remembering previous anchor points as the user returns to a familiar setting. ARCore’s tools for evaluating sources of light within an image, enabling shadows and shading that make digital images more realistic, are viewed as superior. However, these strengths seem minor in the broader scheme.
Longer term, Google’s advantages might be more compelling. Google’s image recognition capability is the best in the world – consider the sophisticated labeling in Google Photos or the ability of Google Translate to quickly render street signs in another language. Future versions of ARKit could link to cloud-based APIs that could yield augmented reality applications with an understanding of their location context. Google also has the most detailed and comprehensive digital maps, including 360-degree image files for many geographies. These could also be assets to future ARKit apps.
A Crowded Field
Apple and Google are not the only companies devoting resources to augmented reality. Microsoft may be the world leader, based on its enterprise-focused HMD-based HoloLens initiative. Facebook acquired Virtual Reality pioneer Oculus in 2015 and offers a range of AR image and video filters through its Instagram business. Snapchat introduced its rudimentary Spectacles product last year to support its AR-laced messaging platform. While Amazon’s rumored smart glasses project is expected to be a vehicle for its voice-driven Alexa platform, the company certainly has the AI scientific horsepower to layer AR onto its platform.
As AR applications grow more sophisticated, they will demand more support from AI-based systems that will help to interpret the real-world context and to intuit user needs and intentions. To that end, we identified the roster of scientists with citations for work in AR and computer vision at 16 top companies in the field (Exhibit 9). To no surprise, Google leads with 96 employees with at least 1000 citations, 30 of which with more than 5,000. Facebook (and its Oculus subsidiary) is next with 42/18 (1K/5K), just ahead of Microsoft at 41/18. Apple, which has beefed up its roster of talent after axing its previous restrictive policy on academic collaboration and publishing last year, now has 18/4, in a cluster with IBM (17/8) and Amazon (17/3). We believe that these numbers are reflective of the size and quality of the scientific and engineering resources each company can bring to bear on AR.
What about AR Glasses?
Despite the Google Glass debacle, HMDs are still expected to be a major platform for AR. The concept easily captures the imagination – Hollywood has used the concept in movies from “The Terminator” to “Mission Impossible”, showing protagonists able to pull up information about the things they see as they see them. However, the current state of the art for projecting images via smart glasses is expensive, bulky and technically limited. Systems capable of interpolating very sophisticated digital objects directly into a wide field of view are available as headset-based platforms costing thousands of dollars.
The biggest obstacle is optics. The basic concepts behind the hardware are straightforward. Images are projected by chip based micro displays – LCOS (liquid crystal on silicon), DLP (digital light processing), and OLED (organic light emitting diodes) are the primary competing technologies. The projections are transmitted into the field of view via optronic waveguides etched into transparent glass-like substrates that allow the user to see through to the real world while also perceiving the digital images (Exhibit 10). To give 3-dimensional depth, the system splits the projection into multiple focal planes, each perceived at a different distance from the user. Only 3 or 4 focal planes are necessary, as the human brain fills in the gradations between the planes as it interprets the image, but each focal plane requires its own waveguides. To achieve full color, three waveguides per focal plane are necessary. Each waveguide is a layer of substrate, so a working AR system will need 9-12 layers, each with a micro display driver. The multiple waveguide layers present a challenge, as they will degrade the view clarity of the real world and an engineering headache in keeping the projected images precisely aligned. Cheaper systems may skimp by eliminating color or by limiting focal depth, but these would be considerable compromises in functionality for many potential use cases.
Processing needs of the system are also considerable. The lag associated with current generation wireless networks is largely unacceptable for augmented realty solutions that must track to head movements. Prototype systems able to locally analyze context from video images and project digital content that interacts with that context require a backpack to house the batteries and electronics. This is far, far from ready for consumer use cases.
However, there are industrial applications where a system requiring a helmet and a backpack with a five-figure price tag will not be an obstacle – those high-tech repairmen, military simulators, and 3D product design systems come to mind. There are other enterprise use cases where a simple heads-up display able to float a legible 2-dimensional black-and-white image off to the side could be invaluable – customer service agents viewing account profiles or field service reps following notes on a complaint ticket. We believe that this is the near-term future for HMD AR technology. With time, high-end systems will come down in price and in bulk, and low-end systems will grow more capable. Eventually, low-latency, high-speed 5G wireless networks will be widely deployed, allowing smart glasses to tap the enormous processing power and storage capacity of the cloud. At that point, probably a decade from now, AR glasses will make the jump to the consumer mass market (Exhibit 11).
What will AR glasses be like in a decade? We see a form factor like ordinary glasses – the electronics will have to be small enough to fit into the frame. This will be tricky, particularly given the power needs of such a system and the slow evolution of battery technology. The system will take commands by voice, gesture, eye movement, and context, tied into a cloud-based AI assistant which will be the primary UI. It will be connected via 5G and Wi-Fi, with images processed and digital objects generated from the cloud or, in Apple’s case, from a local connection to a smartphone.
There will be cultural opposition to the adoption of AR platforms. Social mores on how and when to where AR glasses will have to evolve, and regulations to protect privacy will be implemented. Early adopters may face social stigma, akin to the derogatory references to “Glass-h*#$s” used for Google Glass testers. Still, the utility possible with AR is considerable, and we expect that the early phase of opposition to give way to begrudging acceptance with time. However, that time might be well past 2030.
How to Invest in AR
We believe that it is far too early to make money in AR. We don’t expect ARKit or ARCore to drive returns for their creators. We don’t believe a killer app for smartphone-based AR is forthcoming. We do think that industrial AR will be a small, but growing and profitable field. Eventually companies that make breakthroughs in the cost and quality of AR optics will emerge, but we don’t know who they are yet. Our best advice – Wait out the hype.