How Apple will use AirPods and data science to create the world's most powerful bot

Apple's virtual assistant Siri is about to be the most well-fed data science algorithm on the planet.

Over the past year, Siri hasn't looked good compared to her peers in the bot world -- Facebook and Amazon have opened up their bot platforms to developers, while Siri has continued to be handcuffed to a handful of limited use cases. But with the announcement of Apple's new wireless earbuds, called AirPods, earlier this month, it looks like Siri is about to be exposed to the daily activities of Apple users all over the globe.

AirPods let Siri eat all the data

Apple's dual hardware/software AirPod launch will, if the smartbuds become as ubiquitous as the iPhone, allow Siri to jump from novelty to the preeminent A.I. bot platform in the world. Since Apple clearly intends for users to leave their buds in throughout all of their daily activities, from work to commuting to travel to exercise, Siri will be able to make suggestions at any point based on geolocation, a particular iPhone activity, a Bluetooth pairing (i.e. with a car or entertainment system), and more.

Collecting the language inputs and responses from users as they interact with Siri across activities will potentially give Apple data that surpasses what Google and Amazon are already collecting. As one example, an unnoticeable Siri in your ear would allow for positive or negative responses to different prompts -- like a nod or shake of the head -- to be collected. The data resulting from a few head nods per user per day alone could boost the robustness of Apple's supervised learning models on a massive scale by providing the company with a new wealth of training data sets.

What a bot needs

Bots need data. Bots need data in the form of user interactions to leverage the algorithms that improve their utility, both for consumers and designers. These interactions fuel improvements to the underlying vocabularies, or corpii, of self-improving "intelligent" bots. For instance, Google's Smart Reply uses hundreds of millions of email messages to improve a bot that suggests single-sentence replies to emails.

Obviously, that's a lot of data to improve one small bot. Couple massive datasets with machine learning algorithms and the computing requirements quickly rise, leading to the need for deep learning platforms like Google's TensorFlow. There's speculation that Apple must have similar deep learning up its sleeve, but compared with Google, Microsoft, and others, it's kept quiet on the subject.

Apple competitors like Amazon, Facebook, and other giants are methodically capturing the widest range of consumer data to feed their bots. Amazon has made a substantial investment in gathering data through Echo, through its pricing, support of the developer community, and continuous improvements. Amazon's Echo SDK project is geared toward funneling its core data (consumer purchasing behavior) into bots that can intelligently engage using human language. Facebook has open-sourced its M artificial intelligence platform as wit.ai in the hopes of supporting the exponential increase in A.I.-enabled bots. To date, Facebook's Messenger has well over 11,000 bots, with over 45,000 developers signed up on its natural language-enabled open source platform. Messenger and related platforms from IBM (Watson), Amazon (Alexa), and Google (Smart Reply, Google Now) help these companies collect as much language interaction as possible.

Toward generative bots for all?

OpenAI, a foundation created by Amazon, Y Combinator, Elon Musk, Peter Thiel, and others, recognized the need for a collaborative process to enable a generative approach to artificial intelligence. Apple's move allows for the collection of interactive language responses in an infinite range of behaviors, and has the potential to collect and organize a new corpii of responses from stimuli ranging from driving speed to -- when coupled with an Apple watch -- stress levels. Scarlett Johansson's A.I. character Samantha from Her, 2001's HAL, and other fictional Turing test-beating bots must rely on some amount of generative modeling in order to respond to the unpredictable directions that humans can take a conversation.

A question that will be increasingly important in the world of natural language user interfaces is, Who can access the data? Facebook and Microsoft, through projects like wit.ai and botframework, are encouraging developers to access user language input in as many ways as possible. In the pre-AirPod world, Apple decided to limit Siri's domains in order to maintain consistency in her output. But, as the available data expands and the race towards smarter bots continues, there are clear benefits to widening the scope of interactions by further democratizing Siri's SDK.

Apple's competitors have decided to open-source both their data analytics platforms and language processing tools in order to further end-user adoption. So how will Apple's new language-based data trove strike a balance between privacy, intellectual capital, and technological progress? Wait a month for the AirPods to be widely adopted ... then ask Siri.

AirPods let Siri eat all the data

What a bot needs

Toward generative bots for all?

More